NN

Introduction to Neural Networks
Chri-tian Lorct
lntcicnt Lata Anay-i- and Grajhica òdc- lc-carch nit
Lurojcan Ccntcr tor Sott Comjutin
c, Gonzao Guticrrcz Quiro- -,n, 33o00 ìcrc-, Sjain
christian.borgelt@softcomputing.es
http://www.borgelt.net/
Christian Borgelt Introduction to Neural Networks 1
Contents
Introduction
Motivation, Biological Background
Threshold Logic Units
Denition, Geometric Interpretation, Limitations, Networks of TLUs, Training
General Neural Networks
Structure, Operation, Training
Multilayer Perceptrons
Denition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis
Radial Basis Function Networks
Denition, Function Approximation, Initialization, Training, Generalized Version
Self-Organizing Maps
Denition, Learning Vector Quantization, Neighborhood of Output Neurons
Hopeld Networks
Denition, Convergence, Associative Memory, Solving Optimization Problems
Recurrent Neural Networks
Dierential Equations, Vector Networks, Backpropagation through Time
Motivation: Why (Articial) Neural Networks?
(Neuro-)Biology / (Neuro-)Physiology / Psychology:
Lxjoit -imiarity to rca (iooica) ncura nctwork-
Luid modc- to undcr-tand ncrvc and rain ojcration y -imuation
Computer Science / Engineering / Economics
ìmic ccrtain conitivc cajaiitic- ot human cin-
Sovc carnin,adajtation, jrcdiction, and ojtimization jrocm-
Physics / Chemistry
-c ncura nctwork modc- to dc-cric jhy-ica jhcnomcna
Sjccia ca-c -jin a--c- (aoy- ot manctic and non-manctic mcta-)
Motivation: Why Neural Networks in AI?
Physical-Symbol System Hypothesis |`cwc and Simon 19o|
A jhy-ica--ymo -y-tcm ha- thc nccc--ary and -ucicnt mcan-
tor cncra intcicnt action
Neural networks process simple signals, not symbols.
So why -tudy ncura nctwork- in Articia lntcicncc
Symo-a-cd rcjrc-cntation- work wc tor intcrcncc ta-k-,
ut arc tairy ad tor jcrccjtion ta-k-
Symo-a-cd cxjcrt -y-tcm- tcnd to ct -owcr with rowin knowcdc,
human cxjcrt- tcnd to ct ta-tcr
`cura nctwork- aow tor hihy jarac intormation jrocc--in
Thcrc arc -cvcra -uccc--tu ajjication- in indu-try and nancc
Biological Background
Structure of a prototypical biological neuron
cc corc
axon
mycin -hcath
cc ody
(-oma)
tcrmina utton
-ynaj-i-
dcndritc-
Biological Background
(Very) simplied description of neural information processing
Axon tcrmina rcca-c- chcmica-, cacd neurotransmitters
Thc-c act on thc mcmranc ot thc rcccjtor dcndritc to chanc it- joarization
(Thc in-idc i- u-uay 0m\ morc ncativc than thc out-idc)
Lccrca-c in jotcntia dicrcncc excitatory -ynaj-c
lncrca-c in jotcntia dicrcncc inhibitory -ynaj-c
lt thcrc i- cnouh nct cxcitatory injut, thc axon i- dcjoarizcd
Thc rc-utin action potential travc- aon thc axon
(Sjccd dcjcnd- on thc dcrcc to which thc axon i- covcrcd with mycin)
\hcn thc action jotcntia rcachc- thc tcrmina utton-,
it tricr- thc rcca-c ot ncurotran-mittcr-
A Threshold Logic Unit (TLU) i- a jrocc--in unit tor numcr- with n injut-
x
1
, . . . , x
n
and onc outjut y Thc unit ha- a threshold and cach injut x
i
i-
a--ociatcd with a weight w
i
A thrc-hod oic unit comjutc- thc tunction
y
_
_
1, it x w
n
i1
w
i
x
i
,
0, othcrwi-c
x
1
x
n
w
1
w
n
y
Threshold Logic Units: Examples
Threshold logic unit for the conjunction x
1
x
2
.
!
x
1
3
x
2
2
y
x
1
x
2
3x
1
+ 2x
2
y
0 0 0 0
1 0 3 0
0 1 2 0
1 1 ` 1
Threshold logic unit for the implication x
2
x
1
.
1
x
1
2
x
2
2
y
x
1
x
2
2x
1
2x
2
y
0 0 0 1
1 0 2 1
0 1 2 0
1 1 0 1
Threshold Logic Units: Examples
Threshold logic unit for (x
1
x
2
) (x
1
x
3
) (x
2
x
3
).
1
x
1
2
x
2
2
x
3
2
y
x
1
x
2
x
3
i
w
i
x
i
y
0 0 0 0 0
1 0 0 2 1
0 1 0 2 0
1 1 0 0 0
0 0 1 2 1
1 0 1 ! 1
0 1 1 0 0
1 1 1 2 1
Threshold Logic Units: Geometric Interpretation
Review of line representations
Straiht inc- arc u-uay rcjrc-cntcd in onc ot thc toowin torm-
Lxjicit Iorm g x
2
bx
1
+ c
lmjicit Iorm g a
1
x
1
+ a
2
x
2
+ d 0
loint-Lircction Iorm g x p + kr
òrma Iorm g (x p)n 0
with thc jaramctcr-
b Gradicnt ot thc inc
c Scction ot thc x
2
axi-
p \cctor ot a joint ot thc inc (a-c vcctor)
r Lircction vcctor ot thc inc
n òrma vcctor ot thc inc
A straight line and its dening parameters.
O
x
2
x
1
g
p
r
n (a
1
, a
2
)
c
q
d
[n[
n
[n[
d pn
b
r
2
r
1

How to determine the side on which a point x lies.
O
g
x
1
x
2
x
z
q
d
[n[
n
[n[
z
xn
[n[
n
[n[

Threshold logic unit for x
1
x
2
.
!
x
1
3
x
2
2
y
0 1
1
0
x
1
x
2
0
1
A threshold logic unit for x
2
x
1
.
1
x
1
2
x
2
2
y
0 1
1
0
x
1
x
2
0
1
\i-uaization ot 3-dimcn-iona
Loocan tunction-
x
1
x
2
x
3
(0, 0, 0)
(1, 1, 1)
Threshold logic unit for (x
1
x
2
) (x
1
x
3
) (x
2
x
3
).
1
x
1
2
x
2
2
x
3
2
y
x
1
x
2
x
3
Threshold Logic Units: Limitations
The biimplication problem x
1
x
2
: There is no separating line.
x
1
x
2
y
0 0 1
1 0 0
0 1 0
1 1 1
0 1
1
0
x
1
x
2
Formal proof y reductio ad absurdum
-incc (0, 0) 1 0 , (1)
-incc (1, 0) 0 w
1
< , (2)
-incc (0, 1) 0 w
2
< , (3)
-incc (1, 1) 1 w
1
+ w
2
. (!)
(2) and (3) w
1
+ w
2
< 2 \ith (!) 2 > , or > 0 Contradiction to (1)
Threshold Logic Units: Limitations
Total number and number of linearly separable Boolean functions.
(|\idncr 19o0| a- citcd in |Zc 199!|)
injut- Loocan tunction- incary -cjarac tunction-
1 ! !
2 1o 1!
3 2ò 10!
! o``3o 1!
` !.3 10
9
9!`2
o 1.S 10
19
`.0 10
o
Ior many injut- a thrc-hod oic unit can comjutc amo-t no tunction-
`ctwork- ot thrc-hod oic unit- arc nccdcd to ovcrcomc thc imitation-
Networks of Threshold Logic Units
Solving the biimplication problem with a network.
ldca oica dccomjo-ition x
1
x
2
(x
1
x
2
) (x
2
x
1
)
1
1
3
x
1
x
2
2
2
2
2
2
2
y x
1
x
2
comjutc- y
1
x
1
x
2
d
d
d
ds
comjutc- y
2
x
2
x
1
comjutc- y y
1
y
2
Networks of Threshold Logic Units
Solving the biimplication problem: Geometric interpretation
0 1
1
0
x
1
x
2
g
2
g
1
a
d
c
b
0
1
1
0
0 1
1
0
y
1
y
2
ac
b
d
g
3
0
1
Thc r-t aycr comjutc- ncw Loocan coordinatc- tor thc joint-
Attcr thc coordinatc tran-tormation thc jrocm i- incary -cjarac
Representing Arbitrary Boolean Functions
Lct y f(x
1
, . . . , x
n
) c a Loocan tunction ot n variac-
(i) lcjrc-cnt f(x
1
, . . . , x
n
) in di-,unctivc norma torm That i-, dctcrminc
D
f
K
1
. . . K
m
, whcrc a K
j
arc con,unction- ot n itcra-, ic,
K
j
l
j1
. . . l
jn
with l
ji
x
i
(jo-itivc itcra) or l
ji
x
i
(ncativc
itcra)
(ii) Crcatc a ncuron tor cach con,unction K
j
ot thc di-,unctivc norma torm (havin
n injut- onc injut tor cach variac), whcrc
w
ji

_
2, it l
ji
x
i
,
2, it l
ji
x
i
,
and
j
n 1 +
1
2
n
i1
w
ji
.
(iii) Crcatc an outjut ncuron (havin m injut- onc injut tor cach ncuron that
wa- crcatcd in -tcj (ii)), whcrc
w
(n+1)k
2, k 1, . . . , m, and
n+1
1.
Training Threshold Logic Units
Gcomctric intcrjrctation jrovidc- a way to con-truct thrc-hod oic unit-
with 2 and 3 injut-, ut
òt an automatic mcthod (human vi-uaization nccdcd)
òt tca-ic tor morc than 3 injut-
General idea of automatic training:
Start with random vauc- tor wciht- and thrc-hod
Lctcrminc thc crror ot thc outjut tor a -ct ot trainin jattcrn-
Lrror i- a tunction ot thc wciht- and thc thrc-hod e e(w
1
, . . . , w
n
, )
Adajt wciht- and thrc-hod -o that thc crror ct- -macr
ltcratc adajtation unti thc crror vani-hc-
Single input threshold logic unit for the negation x.
x
w
y
x y
0 1
1 0
Output error as a function of weight and threshold.
crror tor x 0
w
2
1
0
1
2
2
1
0
1
2
e
1
2
1
crror tor x 1
w
2
1
0
1
2
2
1
0
1
2
e
1
2
-um ot crror-
w
2
1
0
1
2
2
1
0
1
2
e
1
2
1
Thc crror tunction cannot c u-cd dirccty, ccau-c it con-i-t- ot jatcau-
Soution lt thc comjutcd outjut i- wron,
takc into account, how tar thc wcihtcd -um i- trom thc thrc-hod
Modied output error as a function of weight and threshold.
crror tor x 0
w
2
1
0
1
2
2
1
0
1
2
e
2
4
2
crror tor x 1
w
2
1
0
1
2
2
1
0
1
2
e
2
4
-um ot crror-
w
2
1
0
1
2
2
1
0
1
2
e
2
4
2
Schemata of resulting directions of parameter changes.
chanc- tor x 0
w
2 1 0 1 2
2
1
0
1
2
'
chanc- tor x 1
w
2 1 0 1 2
2
1
0
1
2
d
d
d
d
-um ot chanc-
w
2 1 0 1 2
2
1
0
1
2
'
d
d
d
d
c
Start at random joint
ltcrativcy adajt jaramctcr-
accordin to thc dircction corrc-jondin to thc currcnt joint
Example training procedure: Online and batch training.
Oninc-Lcrncn
w
2 1 0 1 2
2
1
0
1
2
s ' s
d
d
d
s ' s
d
d
d
s ' s ' s
d
d
d
s ' s
Latch-Lcrncn
w
2 1 0 1 2
2
1
0
1
2
s
c
s ' s
c
s ' s
d
d
d
s ' s
Latch-Lcrncn
w
2
1
0
1
2
2
1
0
1
2
e
2
4
2
1
2
x
1
y
E
x
0 1
'
Training Threshold Logic Units: Delta Rule
Formal Training Rule: Lct x (x
1
, . . . , x
n
) c an injut vcctor ot a thrc-hod
oic unit, o thc dc-ircd outjut tor thi- injut vcctor and y thc actua outjut ot
thc thrc-hod oic unit lt y , o, thcn thc thrc-hod and thc wciht vcctor
w (w
1
, . . . , w
n
) arc adajtcd a- toowin ordcr to rcducc thc crror
(ncw)

(od)
+ with (o y),
i 1, . . . , n w
(ncw)
i
w
(od)
i
+ w
i
with w
i
(o y)x
i
,
whcrc i- a jaramctcr that i- cacd learning rate lt dctcrminc- thc -cvcrity
ot thc wciht chanc- Thi- jroccdurc i- cacd Delta Rule or WidrowHo
Procedure |\idrow and Lo 19o0|
Online Training: Adajt jaramctcr- attcr cach trainin jattcrn
Batch Training: Adajt jaramctcr- ony at thc cnd ot cach epoch,
ic attcr a travcr-a ot a trainin jattcrn-
Turning the threshold value into a weight:
x
1
w
1
x
2
w
2
x
n
w
n
y
n
i1
w
i
x
i

0
1 x
0
w
0

x
1
w
1
x
2
w
2
x
n
w
n
y
n
i1
w
i
x
i
0
procedure oninc trainin (var w, var , L, ).
var y, e. ( outjut, -um ot crror- )
begin
repeat
e 0. ( initiaizc thc crror -um )
for all (x, o) L do begin ( travcr-c thc jattcrn- )
if ( wx ) then y 1. ( comjutc thc outjut )
else y 0. ( ot thc thrc-hod oic unit )
if (y , o) then begin ( it thc outjut i- wron )
(o y). ( adajt thc thrc-hod )
w w + (o y)x. ( and thc wciht- )
e e + [o y[. ( -um thc crror- )
end;
end;
until (e 0). ( rcjcat thc comjutation- )
end; ( unti thc crror vani-hc- )
procedure atch trainin (var w, var , L, ).
var y, e, ( outjut, -um ot crror- )
c
, w
c
. ( -ummcd chanc- )
begin
repeat
e 0.
c
0. w
c

0. ( initiaization- )
for all (x, o) L do begin ( travcr-c thc jattcrn- )
if ( wx ) then y 1. ( comjutc thc outjut )
else y 0. ( ot thc thrc-hod oic unit )
if (y , o) then begin ( it thc outjut i- wron )
c

c
(o y). ( -um thc chanc- ot thc )
w
c
w
c
+ (o y)x. ( thrc-hod and thc wciht- )
e e + [o y[. ( -um thc crror- )
end;
end;
+
c
. ( adajt thc thrc-hod )
w w + w
c
. ( and thc wciht- )
until (e 0). ( rcjcat thc comjutation- )
end; ( unti thc crror vani-hc- )
Training Threshold Logic Units: Online
cjoch x o x w y e w w
1` 2
1 0 1 1.` 0 1 1 0 0` 2
1 0 1` 1 1 1 1 1` 1
2 0 1 1.` 0 1 1 0 0` 1
1 0 0` 1 1 1 1 1` 0
3 0 1 1.` 0 1 1 0 0` 0
1 0 0` 0 0 0 0 0` 0
! 0 1 0.` 0 1 1 0 0.` 0
1 0 0` 1 1 1 1 0` 1
` 0 1 0.` 0 1 1 0 0.` 1
1 0 0.` 0 0 0 0 0.` 1
o 0 1 0` 1 0 0 0 0.` 1
1 0 0.` 0 0 0 0 0.` 1
Training Threshold Logic Units: Batch
cjoch x o x w y e w w
1` 2
1 0 1 1.` 0 1 1 0
1 0 0` 1 1 1 1 1` 1
2 0 1 1.` 0 1 1 0
1 0 0.` 0 0 0 0 0` 1
3 0 1 0.` 0 1 1 0
1 0 0` 1 1 1 1 0` 0
! 0 1 0.` 0 1 1 0
1 0 0.` 0 0 0 0 0.` 0
` 0 1 0` 1 0 0 0
1 0 0` 1 1 1 1 0` 1
o 0 1 0.` 0 1 1 0
1 0 1.` 0 0 0 0 0.` 1
0 1 0` 1 0 0 0
1 0 0.` 0 0 0 0 0.` 1
Training Threshold Logic Units: Conjunction
Threshold logic unit with two inputs for the conjunction.
x
1
w
1
x
2
w
2
y
x
1
x
2
y
0 0 0
1 0 0
0 1 0
1 1 1
2
x
1
2
x
2
1
y
0 1
1
0
0
1
Training Threshold Logic Units: Conjunction
epoch x
1
x
2
o x w y e w
1
w
2
w
1
w
2
0 0 0
1 0 0 0 0 1 1 1 0 0 1 0 0
0 1 0 1 0 0 0 0 0 1 0 0
1 0 0 1 0 0 0 0 0 1 0 0
1 1 1 1 0 1 1 1 1 0 1 1
2 0 0 0 0 1 1 1 0 0 1 1 1
0 1 0 0 1 1 1 0 1 2 1 0
1 0 0 1 0 0 0 0 0 2 1 0
1 1 1 1 0 1 1 1 1 1 2 1
3 0 0 0 1 0 0 0 0 0 1 2 1
0 1 0 0 1 1 1 0 1 2 2 0
1 0 0 0 1 1 1 1 0 3 1 0
1 1 1 2 0 1 1 1 1 2 2 1
4 0 0 0 2 0 0 0 0 0 2 2 1
0 1 0 1 0 0 0 0 0 2 2 1
1 0 0 0 1 1 1 1 0 3 1 1
1 1 1 1 0 1 1 1 1 2 2 2
5 0 0 0 2 0 0 0 0 0 2 2 2
0 1 0 0 1 1 1 0 1 3 2 1
1 0 0 1 0 0 0 0 0 3 2 1
1 1 1 0 1 0 0 0 0 3 2 1
6 0 0 0 3 0 0 0 0 0 3 2 1
0 1 0 2 0 0 0 0 0 3 2 1
1 0 0 1 0 0 0 0 0 3 2 1
1 1 1 0 1 0 0 0 0 3 2 1
Training Threshold Logic Units: Biimplication
cjoch x
1
x
2
o x w y e w
1
w
2
w
1
w
2
0 0 0
1 0 0 1 0 1 0 0 0 0 0 0 0
0 1 0 0 1 1 1 0 1 1 0 1
1 0 0 1 0 0 0 0 0 1 0 1
1 1 1 2 0 1 1 1 1 0 1 0
2 0 0 1 0 1 0 0 0 0 0 1 0
0 1 0 0 1 1 1 0 1 1 1 1
1 0 0 0 1 1 1 1 0 2 0 1
1 1 1 3 0 1 1 1 1 1 1 0
3 0 0 1 0 1 0 0 0 0 0 1 0
0 1 0 0 1 1 1 0 1 1 1 1
1 0 0 0 1 1 1 1 0 2 0 1
1 1 1 3 0 1 1 1 1 1 1 0
Training Threshold Logic Units: Convergence
Convergence Theorem: Lct L (x
1
, o
1
), . . . (x
m
, o
m
) c a -ct ot trainin
jattcrn-, cach con-i-tin ot an injut vcctor x
i
ll
n
and a dc-ircd outjut o
i

0, 1 Iurthcrmorc, ct L
0
(x, o) L [ o 0 and L
1
(x, o) L [ o 1
lt L
0
and L
1
arc incary -cjarac, ic, it w ll
n
and ll cxi-t, -uch that
(x, 0) L
0
wx < and
(x, 1) L
1
wx ,
thcn oninc a- wc a- atch trainin tcrminatc
Thc aorithm- tcrminatc ony whcn thc crror vani-hc-
Thcrctorc thc rc-utin thrc-hod and wciht- mu-t -ovc thc jrocm
Ior not incary -cjarac jrocm- thc aorithm- do not tcrminatc
Training Networks of Threshold Logic Units
Sinc thrc-hod oic unit- havc -tron imitation-
Thcy can ony comjutc incary -cjarac tunction-
`ctwork- ot thrc-hod oic unit- can comjutc aritrary Loocan tunction-
Trainin -inc thrc-hod oic unit- with thc dcta ruc i- ta-t
and uarantccd to nd a -oution it onc cxi-t-
`ctwork- ot thrc-hod oic unit- cannot c traincd, ccau-c
thcrc arc no dc-ircd vauc- tor thc ncuron- ot thc r-t aycr,
thc jrocm can u-uay c -ovcd with dicrcnt tunction-
comjutcd y thc ncuron- ot thc r-t aycr
\hcn thi- -ituation ccamc ccar,
ncura nctwork- wcrc -ccn a- a rc-carch dcad cnd
General (Articial) Neural Networks
Basic graph theoretic notions
A (dircctcd) graph i- a jair G (V, E) con-i-tin ot a (nitc) -ct V ot nodes or
vertices and a (nitc) -ct E V V ot edges
\c ca an cdc e (u, v) E directed trom nodc u to nodc v
Lct G (V, E) c a (dircctcd) rajh and u V a nodc Thcn thc nodc- ot thc
-ct
jrcd(u) v V [ (v, u) E
arc cacd thc predecessors ot thc nodc u
and thc nodc- ot thc -ct
-ucc(u) v V [ (u, v) E
arc cacd thc successors ot thc nodc u
General denition of a neural network
An (articia) neural network i- a (dircctcd) rajh G (U, C),
who-c nodc- u U arc cacd neurons or units and
who-c cdc- c C arc cacd connections
Thc -ct U ot nodc- i- jartitioncd into
thc -ct U
in
ot input neurons,
thc -ct U
out
ot output neurons, and
thc -ct U
hiddcn
ot hidden neurons
lt i-
U U
in
U
out
U
hiddcn
,
U
in
, , U
out
, , U
hiddcn
(U
in
U
out
) .
Lach conncction (v, u) C jo--c--c- a weight w
uv
and
cach ncuron u U jo--c--c- thrcc (rca-vaucd) -tatc variac-
thc network input nct
u
,
thc activation act
u
, and
thc output out
u
Lach injut ncuron u U

in
a-o jo--c--c- a tourth (rca-vaucd) -tatc variac,
thc external input cx
u
Iurthcrmorc, cach ncuron u U jo--c--c- thrcc tunction-

thc network input function f
(u)
nct
ll
2[ jrcd(u)[+
1
(u)
ll,
thc activation function f
(u)
act
ll
2
(u)
ll, and
thc output function f
(u)
out
ll ll,
which arc u-cd to comjutc thc vauc- ot thc -tatc variac-
Types of (articial) neural networks
lt thc rajh ot a ncura nctwork i- acyclic,
it i- cacd a feed-forward network
lt thc rajh ot a ncura nctwork contain- cycles (ackward conncction-),
it i- cacd a recurrent network
Representation of the connection weights by a matrix
u
1
u
2
. . . u
r
_
_
_
_
_
_
w
u
1
u
1
w
u
1
u
2
. . . w
u
1
u
r
w
u
2
u
1
w
u
2
u
2
w
u
2
u
r
w
u
r
u
1
w
u
r
u
2
. . . w
u
r
u
r
_
_
_
_
_
_
u
1
u
2
u
r
General Neural Networks: Example
A simple recurrent neural network
u
1
u
2
u
3
x
1
x
2
y
1
!
2
3
Weight matrix of this network
u
1
u
2
u
3
_
_
_
0 0 !
1 0 0
2 3 0
_
_
_
u
1
u
2
u
3
Structure of a Generalized Neuron
A generalized neuron is a simple numeric processor
u
out
v
1
in
uv
1 d
d
d
d
d
w
uv
1
d
d
d
d
d
out
v
n
in
uv
n

w
uv
n

f
(u)
nct
E
nct
u
f
(u)
act
E
act
u
f
(u)
out
E
out
u
E
d
d
d
d
d
s
E
c
cx
u
T
1
, . . . ,
l
T
1
, . . . ,
k
1
1
1
x
1
x
2
y
1
!
2
3
u
1
u
2
u
3
f
(u)
nct
( w
u
,

in
u
)

vjrcd(u)
w
uv
in
uv

vjrcd(u)
w
uv
out
v
f
(u)
act
(nct
u
, )
_
1, it nct
u
,
0, othcrwi-c
f
(u)
out
(act
u
) act
u
Updating the activations of the neurons
u
1
u
2
u
3
injut jha-c 1 0 0
work jha-c 1 0 0 nct
u
3
2
0 0 0 nct
u
1
0
0 0 0 nct
u
2
0
0 0 0 nct
u
3
0
0 0 0 nct
u
1
0
Ordcr in which thc ncuron- arc ujdatcd
u
3
, u
1
, u
2
, u
3
, u
1
, u
2
, u
3
, . . .
A -tac -tatc with a uniquc outjut i- rcachcd
Updating the activations of the neurons
u
1
u
2
u
3
injut jha-c 1 0 0
work jha-c 1 0 0 nct
u
3
2
1 1 0 nct
u
2
1
0 1 0 nct
u
1
0
0 1 1 nct
u
3
3
0 0 1 nct
u
2
0
1 0 1 nct
u
1
!
1 0 0 nct
u
3
2
Ordcr in which thc ncuron- arc ujdatcd
u
3
, u
2
, u
1
, u
3
, u
2
, u
1
, u
3
, . . .
ò -tac -tatc i- rcachcd (o-ciation ot outjut)
General Neural Networks: Training
Denition of learning tasks for a neural network
A xed learning task L
xcd
tor a ncura nctwork with
n injut ncuron-, ic U
in
u
1
, . . . , u
n
, and
m outjut ncuron-, ic U
out
v
1
, . . . , v
m
,
i- a -ct ot training patterns l (
(l)
, o
(l)
), cach con-i-tin ot
an input vector
(l)
( cx
(l)
u
1
, . . . , cx
(l)
u
n
) and
an output vector o
(l)
(o
(l)
v
1
, . . . , o
(l)
v
m
)
A xcd carnin ta-k i- -ovcd, it tor a trainin jattcrn- l L
xcd
thc ncura
nctwork comjutc- trom thc cxtcrna injut- containcd in thc injut vcctor
(l)
ot a
trainin jattcrn l thc outjut- containcd in thc corrc-jondin outjut vcctor o
(l)

Solving a xed learning task: Error denition
`ca-urc how wc a ncura nctwork -ovc- a ivcn xcd carnin ta-k
Comjutc dicrcncc- ctwccn dc-ircd and actua outjut-
Lo not -um dicrcncc- dirccty in ordcr to avoid crror- canccin cach othcr
Squarc ha- tavorac jrojcrtic- tor dcrivin thc adajtation ruc-
e
lL
xed
e
(l)
vU
out
e
v

lL
xed
vU
out
e
(l)
v
,
whcrc e
(l)
v

_
o
(l)
v
out
(l)
v
_
2
Denition of learning tasks for a neural network
A free learning task L
trcc
tor a ncura nctwork with
n injut ncuron-, ic U
in
u
1
, . . . , u
n
,
i- a -ct ot training patterns l (
(l)
), cach con-i-tin ot
an input vector
(l)
( cx
(l)
u
1
, . . . , cx
(l)
u
n
)
lrojcrtic-
Thcrc i- no dc-ircd outjut tor thc trainin jattcrn-
Outjut- can c cho-cn trccy y thc trainin mcthod
Soution idca Similar inputs should lead to similar outputs.
(cu-tcrin ot injut vcctor-)
General Neural Networks: Preprocessing
Normalization of the input vectors
Comjutc cxjcctcd vauc and -tandard dcviation tor cach injut
k

1
[L[
lL
cx
(l)
u
k
and
k

_
1
[L[
lL
_
cx
(l)
u
k
k
_
2
,
òrmaizc thc injut vcctor- to cxjcctcd vauc 0 and -tandard dcviation 1
cx
(l)(ncu)
u
k

cx
(l)(at)
u
k

k
k
Avoid- unit and -cain jrocm-
Multilayer Perceptrons (MLPs)
An r layer perceptron i- a ncura nctwork with a rajh G (U, C)
that -ati-c- thc toowin condition-
(i) U
in
U
out
,
(ii) U
hiddcn
U
(1)
hiddcn
U
(r2)
hiddcn
,
1 i < j r 2 U
(i)
hiddcn
U
(j)
hiddcn
,
(iii) C
_
U
in
U
(1)
hiddcn
_
r3
i1
U
(i)
hiddcn
U
(i+1)
hiddcn
_
_
U
(r2)
hiddcn
U
out
_
or, it thcrc arc no hiddcn ncuron- (r 2, U
hiddcn
),
C U
in
U
out
Iccd-torward nctwork with -tricty aycrcd -tructurc

General structure of a multilayer perceptron
x
1
x
2
x
n
U
in
U
(1)
hiddcn
U
(2)
hiddcn

U
(r2)
hiddcn
U
out
y
1
y
2
y
m
Thc nctwork injut tunction ot cach hiddcn ncuron and ot cach outjut ncuron
i- thc weighted sum ot it- injut-, ic
u U
hiddcn
U
out
f
(u)
nct
( w
u
,

in
u
) w
u
in
u

vjrcd (u)
w
uv
out
v
.
Thc activation tunction ot cach hiddcn ncuron i- a -o-cacd
sigmoid function, ic a monotonou-y incrca-in tunction
f ll |0, 1| with im
x
f(x) 0 and im
x
f(x) 1.
Thc activation tunction ot cach outjut ncuron i- cithcr a-o a -imoid tunction
or a linear function, ic
f
act
(nct, ) nct .
Sigmoid Activation Functions
-tcj tunction
f
act
(net, ) =
_
1, if net ,
0, otherwise.
net
1
2
1
-cmi-incar tunction
f
act
(net, ) =
_
1, if net > +
1
2
,
0, if net <
1
2
,
(net ) +
1
2
, otherwise.
net
1
2
1

1
2
+
1
2
-inc unti -aturation
f
act
(net, ) =
_
_
_
1, if net > +

2
,
0, if net <

2
,
sin(net )+1
2
, otherwise.
net
1
2
1

2
+

2
oi-tic tunction
f
act
(net, ) =
1
1 + e
(net )
net
1
2
1
8 4 + 4 + 8
Sigmoid Activation Functions
A -imoid tunction- on thc jrcviou- -idc arc unipolar,
ic, thcy ranc trom 0 to 1
Somctimc- bipolar -imoid tunction- arc u-cd,
ikc thc tangens hyperbolicus
tancn- hyjcroicu-
f
act
(nct, ) tanh(nct )
2
1 + e
2(nct )
1
net
1
0
1
4 2 + 2 + 4
Multilayer Perceptrons: Weight Matrices
Lct U
1
v
1
, . . . , v
m
and U
2
u
1
, . . . , u
n
c thc ncuron- ot two con-ccutivc
aycr- ot a mutiaycr jcrccjtron
Thcir conncction wciht- arc rcjrc-cntcd y an n m matrix
W
_
_
_
_
_
_
w
u
1
v
1
w
u
1
v
2
. . . w
u
1
v
m
w
u
2
v
1
w
u
2
v
2
. . . w
u
2
v
m
w
u
n
v
1
w
u
n
v
2
. . . w
u
n
v
m
_
_
_
_
_
_
,
whcrc w
u
i
v
j
0 it thcrc i- no conncction trom ncuron v
j
to ncuron u
i
Advantac Thc comjutation ot thc nctwork injut can c writtcn a-
nct
U
2
W

in
U
2
W

out
U
1
whcrc

nct
U
2
(nct
u
1
, . . . , nct
u
n
)
and

in
U
2

out
U
1
(out
v
1
, . . . , out
v
m
)

Multilayer Perceptrons: Biimplication
Solving the biimplication problem with a multilayer perceptron.
1
1
3
x
1
x
2
U
in
2
2
2
2
U
hiddcn
U
out
2
2
y
òtc thc additiona injut ncuron- comjarcd to thc TL -oution
W
1

_
2 2
2 2
_
and W
2

_
2 2
_
Multilayer Perceptrons: Fredkin Gate
s
x
1
x
2
s
y
1
y
2
0
a
b
0
a
b
1
a
b
1
b
a
s 0 0 0 0 1 1 1 1
x
1
0 0 1 1 0 0 1 1
x
2
0 1 0 1 0 1 0 1
y
1
0 0 1 1 0 1 0 1
y
2
0 1 0 1 0 0 1 1
x
1
x
2
s
y
1
x
1
x
2
s
y
2
Multilayer Perceptrons: Fredkin Gate
1
3
3
1
1
1
x
1
s
x
2
U
in
2
2
2
2
2
2
2
2
U
hiddcn
2
2
2
2
U
out
y
1
y
2
W
1

_
_
_
_
_
_
2 2 0
2 2 0
0 2 2
0 2 2
_
_
_
_
_
_
W
2

_
2 0 2 0
0 2 0 2
_
Why Non-linear Activation Functions?
\ith wciht matricc- wc havc tor two con-ccutivc aycr- U
1
and U
2
nct
U
2
W

in
U
2
W

out
U
1
.
lt thc activation tunction- arc incar, ic,
f
act
(nct, ) nct .
thc activation- ot thc ncuron- in thc aycr U
2
can c comjutcd a-
act
U
2
D
act

nct
U
2

,
whcrc

act
U
2
(act
u
1
, . . . , act
u
n
)
i- thc activation vcctor,

D
act
i- an n n diaona matrix ot thc tactor-
u
i
, i 1, . . . , n, and

(
u
1
, . . . ,
u
n
)
i- a ia- vcctor
lt thc outjut tunction i- a-o incar, it i- anaoou-y
out
U
2
D
out

act
U
2

,
whcrc

out
U
2
(out
u
1
, . . . , out
u
n
)
i- thc outjut vcctor,

D
out
i- aain an n n diaona matrix ot tactor-, and

(
u
1
, . . . ,
u
n
)
a ia- vcctor
Cominin thc-c comjutation- wc ct
out
U
2
D
out
_
D
act
_
W

out
U
1
_
and thu-
out
U
2
A
12

out
U
1
+
b
12
with an n m matrix A
12
and an n-dimcn-iona vcctor
b
12

Thcrctorc wc havc
out
U
2
A
12

out
U
1
+
b
12
and
out
U
3
A
23

out
U
2
+
b
23
tor thc comjutation- ot two con-ccutivc aycr- U
2
and U
3
Thc-c two comjutation- can c comincd into
out
U
3
A
13

out
U
1
+
b
13
,
whcrc A
13
A
23
A
12
and
b
13
A
23
b
12
+
b
23
Result: \ith incar activation and outjut tunction- any mutiaycr jcrccjtron
can c rcduccd to a two-aycr jcrccjtron
Multilayer Perceptrons: Function Approximation
General idea of function approximation
Ajjroximatc a ivcn tunction y a -tcj tunction
Con-truct a ncura nctwork that comjutc- thc -tcj tunction
x
y
x
1
x
2
x
3
x
!
y
0
y
1
y
2
y
3
y
!
x
1
x
2
x
3
x
!
1
1
1
id
x
1
1
1
1
2
2
2
2
2
2
y
1
y
2
y
3
y
Theorem: Any licmann-intcrac tunction can c ajjroximatcd with aritrary
accuracy y a tour-aycr jcrccjtron
Lut Lrror i- mca-urcd a- thc area ctwccn thc tunction-
òrc -ojhi-ticatcd mathcmatica cxamination aow- a -troncr a--crtion
\ith a thrcc-aycr jcrccjtron any continuou- tunction can c ajjroximatcd
with aritrary accuracy (crror maximum tunction vauc dicrcncc)
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
0
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
0
1
0
1
0
1
y
1
y
2
y
3
y
!
x
1
x
2
x
3
x
!
id
x
1
1
1
1
y
1
y
2
y
3
y
!
y
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
0
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
0
1
0
1
0
1
3
3
3
3
3
3
y
1
3
3
3
3
3
3
y
2
3
3
3
3
3
3
y
3
3
3
3
3
3
3
y
!
!
id
x
1
x
1
x
1
x
1
x
y
1
y
2
y
3
y
!
i

x
i
x
Mathematical Background: Regression
Mathematical Background: Linear Regression
Training neural networks is closely related to regression
Givcn A data-ct ((x
1
, y
1
), . . . , (x
n
, y
n
)) ot n data tujc- and
a hyjothc-i- aout thc tunctiona rcation-hij, c y g(x) a + bx
Ajjroach ìnimizc thc -um ot -quarcd crror-, ic
F(a, b)
n
i1
(g(x
i
) y
i
)
2
i1
(a + bx
i
y
i
)
2
.
`ccc--ary condition- tor a minimum
F
a

n
i1
2(a + bx
i
y
i
) 0 and
F
b

n
i1
2(a + bx
i
y
i
)x
i
0
Mathematical Background: Linear Regression
lc-ut ot nccc--ary condition- Sy-tcm ot -o-cacd normal equations, ic
na +
_
_
n
i1
x
i
_
_
b
n
i1
y
i
,
_
_
n
i1
x
i
_
_
a +
_
_
n
i1
x
2
i
_
_
b
n
i1
x
i
y
i
.
Two incar cquation- tor two unknown- a and b
Sy-tcm can c -ovcd with -tandard mcthod- trom incar acra
Soution i- uniquc unc-- a x-vauc- arc idcntica
Thc rc-utin inc i- cacd a regression line
Linear Regression: Example
x 1 2 3 ! ` o S
y 1 3 2 3 ! 3 ` o
y
3
!
+

12
x.
x
y
0 1 2 3 ! ` o S
0
1
2
3
!
`
o
Mathematical Background: Polynomial Regression
Generalization to polynomials
y p(x) a
0
+ a
1
x + . . . + a
m
x
m
F(a
0
, a
1
, . . . , a
m
)
n
i1
(p(x
i
) y
i
)
2
i1
(a
0
+ a
1
x
i
+ . . . + a
m
x
m
i
y
i
)
2
`ccc--ary condition- tor a minimum A jartia dcrivativc- vani-h, ic
F
a
0
0,
F
a
1
0, . . . ,
F
a
m
0.
Mathematical Background: Polynomial Regression
System of normal equations for polynomials
na
0
+
_
_
n
i1
x
i
_
_
a
1
+ . . . +
_
_
n
i1
x
m
i
_
_
a
m

n
i1
y
i
_
_
n
i1
x
i
_
_
a
0
+
_
_
n
i1
x
2
i
_
_
a
1
+ . . . +
_
_
n
i1
x
m+1
i
_
_
a
m

n
i1
x
i
y
i
_
_
n
i1
x
m
i
_
_
a
0
+
_
_
n
i1
x
m+1
i
_
_
a
1
+ . . . +
_
_
n
i1
x
2m
i
_
_
a
m

n
i1
x
m
i
y
i
,
m + 1 incar cquation- tor m + 1 unknown- a
0
, . . . , a
m

Soution i- uniquc unc-- a x-vauc- arc idcntica
Mathematical Background: Multilinear Regression
Generalization to more than one argument
z f(x, y) a + bx + cy
F(a, b, c)
n
i1
(f(x
i
, y
i
) z
i
)
2
i1
(a + bx
i
+ cy
i
z
i
)
2
`ccc--ary condition- tor a minimum A jartia dcrivativc- vani-h, ic
F
a

n
i1
2(a + bx
i
+ cy
i
z
i
) 0,
F
b

n
i1
2(a + bx
i
+ cy
i
z
i
)x
i
0,
F
c

n
i1
2(a + bx
i
+ cy
i
z
i
)y
i
0.
Mathematical Background: Multilinear Regression
System of normal equations for several arguments
na +
_
_
n
i1
x
i
_
_
b +
_
_
n
i1
y
i
_
_
c
n
i1
z
i
_
_
n
i1
x
i
_
_
a +
_
_
n
i1
x
2
i
_
_
b +
_
_
n
i1
x
i
y
i
_
_
c
n
i1
z
i
x
i
_
_
n
i1
y
i
_
_
a +
_
_
n
i1
x
i
y
i
_
_
b +
_
_
n
i1
y
2
i
_
_
c
n
i1
z
i
y
i
3 incar cquation- tor 3 unknown- a, b, and c
Soution i- uniquc unc-- a x- or a y-vauc- arc idcntica
Multilinear Regression
General multilinear case:
y f(x
1
, . . . , x
m
) a
0
+
m
k1
a
k
x
k
F(a) (Xa y)
(Xa y),
whcrc
X
_
_
_
1 x
11
. . . x
m1
1 x
1n
. . . x
mn
_
_
_, y
_
_
_
y
1
y
n
_
_
_, and a
_
_
_
_
_
_
a
0
a
1
a
m
_
_
_
_
_
_
`ccc--ary condition- tor a minimum
a
F(a)
a
(Xa y)
(Xa y)
0

a
F(a) may ca-iy c comjutcd y rcmcmcrin that thc dicrcntia ojcrator
a

_

a
0
, . . . ,

a
m
_
chavc- tormay ikc a vcctor that i- mutijicd to thc -um ot -quarcd crror-
Atcrnativcy, onc may writc out thc dicrcntiation comjoncntwi-c
\ith thc tormcr mcthod wc otain tor thc dcrivativc
a
(Xa y)
(Xa y)
(
a
(Xa y))
(Xa y) + ((Xa y)
(
a
(Xa y)))
(
a
(Xa y))
(Xa y) + (
a
(Xa y))
(Xa y)
2X
(Xa y)
2X
Xa 2X
y

0
`ccc--ary condition tor a minimum thcrctorc
a
F(a)
a
(Xa y)
(Xa y)
2X
Xa 2X
y
'

0
A- a con-cqucncc wc ct thc -y-tcm ot normal equations
X
Xa X
y
Thi- -y-tcm ha- a -oution it X
X i- not -inuar Thcn wc havc

a (X
X)
1
X
y.
(X
X)
1
X
i- cacd thc (òorc-lcnro-c-)Pseudoinverse ot thc matrix X

\ith thc matrix-vcctor rcjrc-cntation ot thc rcrc--ion jrocm an cxtcn-ion to
multipolynomial regression i- -traihtorward
Simjy add thc dc-ircd jroduct- ot jowcr- to thc matrix X
Mathematical Background: Logistic Regression
Generalization to non-polynomial functions
Simjc cxamjc y ax
b
ldca Iind tran-tormation to incar,joynomia ca-c
Tran-tormation tor cxamjc n y n a + b n x.
Sjccia ca-c logistic function
y
Y
1 + e
a+bx

1
y

1 + e
a+bx
Y

Y y
y
e
a+bx
.
lc-ut Ajjy -o-cacd Logit-Transformation
n
_
Y y
y
_
a + bx.
Logistic Regression: Example
x 1 2 3 ! `
y 0! 10 30 `0 ò
Tran-torm thc data with
z n
_
Y y
y
_
, Y o.
Thc tran-tormcd data joint- arc
x 1 2 3 ! `
z 2o! 1o1 000 1.o1 2.o!
Thc rc-utin rcrc--ion inc i-
z 1.3`x + !.133.
Logistic Regression: Example
1 2 3 ! `
!
3
2
1
0
1
2
3
!
x
z
0
1
2
3
!
`
o
0 1 2 3 ! `
Y = 6
x
y
Thc oi-tic rcrc--ion tunction can c comjutcd y a -inc ncuron with
nctwork injut tunction f
nct
(x) wx with w 1.3`,
activation tunction f
act
(nct, ) (1 + e
(nct
))
1
with !.133 and
outjut tunction f
out
(act) o act
Training Multilayer Perceptrons
Training Multilayer Perceptrons: Gradient Descent
lrocm ot oi-tic rcrc--ion \ork- ony tor two-aycr jcrccjtron-
òrc cncra ajjroach gradient descent
`ccc--ary condition dierentiable activation and output functions
x
y
z
x
0
y
0
z
x
[
x
0
z
y
[
y
0
z[
(x
0
,y
0
)
lu-tration ot thc radicnt ot a rca-vaucd tunction z f(x, y) at a joint (x
0
, y
0
)
lt i-

z[
(x
0
,y
0
)

_
z
x
[
x
0
,
z
y
[
y
0
_

Gradient Descent: Formal Approach
General Idea Ajjroach thc minimum ot thc crror tunction in -ma -tcj-
Lrror tunction
e
lL
xed
e
(l)
vU
out
e
v

lL
xed
vU
out
e
(l)
v
,
Iorm radicnt to dctcrminc thc dircction ot thc -tcj
w
u
e
e
w
u
u
,
e
w
up
1
, . . . ,
e
w
up
n
_
.
Lxjoit thc -um ovcr thc trainin jattcrn-
w
u
e
e
w
u

w
u
lL
xed
e
(l)
lL
xed
e
(l)
w
u
.
Sinc jattcrn crror dcjcnd- on wciht- ony throuh thc nctwork injut
w
u
e
(l)
e
(l)
w
u
e
(l)
nct
(l)
u
nct
(l)
u
w
u
.
Sincc nct
(l)
u
w
u
in
(l)
u
wc havc tor thc -ccond tactor
nct
(l)
u
w
u

in
(l)
u
.
Ior thc r-t tactor wc con-idcr thc crror e
(l)
tor thc trainin jattcrn l (
(l)
, o
(l)
)
e
(l)
vU
out
e
(l)
u

vU
out
_
o
(l)
v
out
(l)
v
_
2
,
ic thc -um ot thc crror- ovcr a outjut ncuron-
Thcrctorc wc havc
e
(l)
nct
(l)
u
vU
out
_
o
(l)
v
out
(l)
v
_
2
nct
(l)
u
vU
out
_
o
(l)
v
out
(l)
v
_
2
nct
(l)
u
.
Sincc ony thc actua outjut out
(l)
v
ot an outjut ncuron v dcjcnd- on thc nctwork
injut nct
(l)
u
ot thc ncuron u wc arc con-idcrin, it i-
e
(l)
nct
(l)
u
2
vU
out
_
o
(l)
v
out
(l)
v
_
out
(l)
v
nct
(l)
u
. .
(l)
u
,
which a-o introducc- thc arcviation
(l)
u
tor thc imjortant -um ajjcarin hcrc
Li-tinui-h two ca-c- Thc ncuron u i- an output neuron
Thc ncuron u i- a hidden neuron
ln thc r-t ca-c wc havc
u U
out

(l)
u

_
o
(l)
u
out
(l)
u
_
out
(l)
u
nct
(l)
u
Thcrctorc wc havc tor thc radicnt
u U
out

w
u
e
(l)
u

e
(l)
u
w
u
2
_
o
(l)
u
out
(l)
u
_
out
(l)
u
nct
(l)
u
in
(l)
u
and thu- tor thc wciht chanc
u U
out
w
(l)
u

w
u
e
(l)
u

_
o
(l)
u
out
(l)
u
_
out
(l)
u
nct
(l)
u
in
(l)
u
.
Lxact tormuac dcjcnd on choicc ot activation and outjut tunction,
-incc it i-
out
(l)
u
f
out
( act
(l)
u
) f
out
(f
act
( nct
(l)
u
)).
Con-idcr -jccia ca-c with
outjut tunction i- thc idcntity,
activation tunction i- oi-tic, ic f
act
(x)
1
1+e
x
Thc r-t a--umjtion yicd-

out
(l)
u
nct
(l)
u
act
(l)
u
nct
(l)
u
f
/
act
( nct
(l)
u
).
Ior a oi-tic activation tunction wc havc
f
/
act
(x)
d
dx
_
1 + e
x
_
1

_
1 + e
x
_
2
_
e
x
_
1 + e
x
1
(1 + e
x
)
2

1
1 + e
x
_
1
1
1 + e
x
_
f
act
(x) (1 f
act
(x)),
and thcrctorc
f
/
act
( nct
(l)
u
) f
act
( nct
(l)
u
)
_
1 f
act
( nct
(l)
u
)
_
out
(l)
u
_
1 out
(l)
u
_
.
Thc rc-utin wciht chanc i- thcrctorc
w
(l)
u

_
o
(l)
u
out
(l)
u
_
out
(l)
u
_
1 out
(l)
u
_

in
(l)
u
,
which makc- thc comjutation- vcry -imjc
Error Backpropagation
Con-idcr now Thc ncuron u i- a hidden neuron, ic u U
k
, 0 < k < r 1
Thc outjut out
(l)
v
ot an outjut ncuron v dcjcnd- on thc nctwork injut nct
(l)
u
ony indirccty throuh it- -uccc--or ncuron- -ucc(u) s U [ (u, s) C
s
1
, . . . , s
m
U
k+1
, namcy throuh thcir nctwork injut- nct
(l)
s

\c ajjy thc chain ruc to otain
(l)
u

vU
out
s-ucc(u)
(o
(l)
v
out
(l)
v
)
out
(l)
v
nct
(l)
s
nct
(l)
s
nct
(l)
u
.
Lxchanin thc -um- yicd-
(l)
u

s-ucc(u)
_
_

vU
out
(o
(l)
v
out
(l)
v
)
out
(l)
v
nct
(l)
s
_
_
nct
(l)
s
nct
(l)
u
s-ucc(u)
(l)
s
nct
(l)
s
nct
(l)
u
.
Con-idcr thc nctwork injut
nct
(l)
s
w
s
in
(l)
s

_
_
_
pjrcd(s)
w
sp
out
(l)
p
_
_
_
s
,
whcrc onc ccmcnt ot

in
(l)
s
i- thc outjut out
(l)
u
ot thc ncuron u Thcrctorc it i-
nct
(l)
s
nct
(l)
u
_
_
_
pjrcd(s)
w
sp
out
(l)
p
nct
(l)
u
_
_
_

s
nct
(l)
u
w
su
out
(l)
u
nct
(l)
u
,
Thc rc-ut i- thc rccur-ivc cquation (crror ackjrojaation)
(l)
u

_
_
_
s-ucc(u)
(l)
s
w
su
_
_
_
out
(l)
u
nct
(l)
u
.
Thc rc-utin tormua tor thc wciht chanc i-
w
(l)
u

w
u
e
(l)

(l)
u

in
(l)
u

_
_
_
s-ucc(u)
(l)
s
w
su
_
_
_
out
(l)
u
nct
(l)
u
in
(l)
u
.
Con-idcr aain thc -jccia ca-c with
activation tunction i- oi-tic
Thc rc-utin tormua tor thc wciht chanc i- thcn
w
(l)
u

_
_
_
s-ucc(u)
(l)
s
w
su
_
_
_out
(l)
u
(1 out
(l)
u
)

in
(l)
u
.
Error Backpropagation: Cookbook Recipe
u U
in

out
(l)
u
cx
(l)
u
torward
jrojaation
u U
hiddcn
U
out

out
(l)
u

_
1 + cxj
_
pjrcd(u)
w
up
out
(l)
p
__
1
oi-tic
activation
tunction
imjicit
ia- vauc
x
1
x
2
x
n
y
1
y
2
y
m
u U
hiddcn

(l)
u

_
s-ucc(u)
(l)
s
w
su
_
(l)
u
ackward
jrojaation
u U
out

(l)
u

_
o
(l)
u
out
(l)
u
_
(l)
u
crror tactor
(l)
u
out
(l)
u
_
1 out
(l)
u
_
activation
dcrivativc
wciht
chanc
w
(l)
up

(l)
u
out
(l)
p
Gradient Descent: Examples
Gradient descent training for the negation x
x
w
y
x y
0 1
1 0
crror tor x 0
w
e
4
2
0
2
4
4
2
0
2
4
1
2
1
crror tor x 1
w
e
4
2
0
2
4
4
2
0
2
4
1
2
-um ot crror-
w
e
4
2
0
2
4
4
2
0
2
4
1
2
1
cjoch w crror
0 300 3`0 130
20 3 219 09So
!0 31 1S1 090
o0 3`0 1`3 09`S
S0 31` 12! 093
100 2` 0SS 0S90
120 1!S 02` 02`
1!0 0.0o 0.9S 0331
1o0 0.S0 2.0 01!9
1S0 1.19 2.! 00S
200 1.!! 3.20 00`9
220 1.o2 3.`! 00!!
Oninc Trainin
cjoch w crror
0 300 3`0 129`
20 3o 220 09S`
!0 30 1S2 090
o0 3!S 1`3 09`
S0 311 12` 093!
100 2!9 0SS 0SS0
120 12 022 0oo
1!0 0.21 1.0! 0292
1o0 0.So 2.0S 01!0
1S0 1.21 2.! 00S!
200 1.!` 3.19 00`S
220 1.o3 3.`3 00!!
Latch Trainin
Visualization of gradient descent for the negation x
Oninc Trainin
w
4 2 0 2 4
4
2
0
2
4
Latch Trainin
w
4 2 0 2 4
4
2
0
2
4
Latch Trainin
w
e
4
2
0
2
4
4
2
0
2
4
1
2
1
Trainin i- oviou-y -uccc--tu
Lrror cannot vani-h comjctcy duc to thc jrojcrtic- ot thc oi-tic tunction
Lxamjc tunction f(x)
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 0.200 3.112 11.1! 0.011
1 0.211 2.990 10.S11 0.011
2 0.222 2.S! 10.!90 0.010
3 0.232 2.oo 10.1S2 0.010
! 0.2!3 2.oo! 9.SSS 0.010
` 0.2`3 2.òS 9.o0o 0.010
o 0.2o2 2.! 9.33` 0.009
0.21 2.391 9.0` 0.009
S 0.2S1 2.309 S.S2` 0.009
9 0.2S9 2.233 S.`S` 0.009
10 0.29S 2.1o0
6
5
4
3
2
1
0
0 1 2 3 4
Gradicnt dc-ccnt with initia vauc 0.2 and carnin ratc 0.001
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 1.`00 2.19 3.`00 0.S`
1 0.o2` 0.o`` 1.!31 0.3`S
2 0.9S3 0.9`` 2.``! 0.o39
3 0.3!! 1.S01 .1` 1.S9
! 2.13! !.12 0.ò 0.1!2
` 1.992 3.9S9 1.3S0 0.3!`
o 1.o! 3.203 3.0o3 0.oo
0.SS1 0.3! 1.`3 0.!3S
S 0.!!3 1.211 !.S`1 1.213
9 1.oò 3.231 3.029 0.`
10 0.S9S 0.oo
6
5
4
3
2
1
0
0 1 2 3 4
start
Gradicnt dc-ccnt with initia vauc 1.` and carnin ratc 0.2`
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 2.o00 3.S1o 1.0 0.0S`
1 2.oS` 3.oo0 1.9! 0.09
2 2.S3 3.!o1 2.11o 0.10o
3 2.SSS 3.233 2.1`3 0.10S
! 2.99o 3.00S 2.009 0.100
` 3.09 2.S20 1.oSS 0.0S!
o 3.1S1 2.o9` 1.2o3 0.0o3
3.2!! 2.o2S 0.S!` 0.0!2
S 3.2So 2.`99 0.`1` 0.02o
9 3.312 2.`S9 0.293 0.01`
10 3.32 2.`S`
6
5
4
3
2
1
0
0 1 2 3 4
Gradicnt dc-ccnt with initia vauc 2.o and carnin ratc 0.0`
Gradient Descent: Variants
\ciht ujdatc ruc
w(t + 1) w(t) + w(t)
Standard backpropagation:
w(t)
w
e(t)
Manhattan training:
w(t) -n(
w
e(t)).
Momentum term:
w(t)
w
e(t) + w(t 1),
Self-adaptive error backpropagation:
w
(t)
_
_
c

w
(t 1), it
w
e(t)
w
e(t 1) < 0,
c
+

w
(t 1), it
w
e(t)
w
e(t 1) > 0

w
e(t 1)
w
e(t 2) 0,
w
(t 1), othcrwi-c
Resilient error backpropagation:
w(t)
_
_
c
w(t 1), it
w
e(t)
w
e(t 1) < 0,
c
+
w(t 1), it
w
e(t)
w
e(t 1) > 0

w
e(t 1)
w
e(t 2) 0,
w(t 1), othcrwi-c
Tyjica vauc- c
|0.`, 0.| and c

+
|1.0`, 1.2|
Quickpropagation
e
w
m w(t+1) w(t) w(t1)
e(t)
e(t1)
apex
w
w
e
w(t+1) w(t) w(t1)
w
e(t)
w
e(t1)
0 Thc wciht ujdatc ruc can c
dcrivcd trom thc trianc-
w(t)

w
e(t)
w
e(t 1)
w
e(t)
w(t 1).
cjoch w crror
0 300 3`0 129`
20 3o 220 09S`
!0 30 1S2 090
o0 3!S 1`3 09`
S0 311 12` 093!
100 2!9 0SS 0SS0
120 12 022 0oo
1!0 0.21 1.0! 0292
1o0 0.So 2.0S 01!0
1S0 1.21 2.! 00S!
200 1.!` 3.19 00`S
220 1.o3 3.`3 00!!
without momcntum tcrm
cjoch w crror
0 300 3`0 129`
10 3S0 219 09S!
20 3` 1S! 091
30 3ò 1`S 09o0
!0 32o 133 09!3
`0 29 10! 0910
o0 199 0o0 0S1!
0 0`! 0.2` 0!9
S0 0.`3 1.`1 0211
90 1.02 2.3o 0113
100 1.31 2.92 003
110 1.`2 3.31 00`3
120 1.o 3.o1 00!1
with momcntum tcrm
without momcntum tcrm
w
4 2 0 2 4
4
2
0
2
4
with momcntum tcrm
w
4 2 0 2 4
4
2
0
2
4
with momcntum tcrm
w
e
4
2
0
2
4
4
2
0
2
4
1
2
1
Lot- -how jo-ition cvcry 20 (without momcntum tcrm)
or cvcry 10 cjoch- (with momcntum tcrm)
Lcarnin with a momcntum tcrm i- aout twicc a- ta-t
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 0.200 3.112 11.1! 0.011
1 0.211 2.990 10.S11 0.021
2 0.232 2.1 10.19o 0.029
3 0.2o1 2.!SS 9.3oS 0.03`
! 0.29o 2.13 S.39 0.0!0
` 0.33 1.Sò .3!S 0.0!!
o 0.3S0 1.``9 o.2 0.0!o
0.!2o 1.29S `.22S 0.0!o
S 0.!2 1.09 !.23` 0.0!o
9 0.`1S 0.90 3.319 0.0!`
10 0.ò2 0.
6
5
4
3
2
1
0
0 1 2 3 4
radicnt dc-ccnt with momcntum tcrm ( 0.9)
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 1.`00 2.19 3.`00 1.0`0
1 0.!`0 1.1S !.o99 0.0`
2 1.1`` 1.!o 3.39o 0.`09
3 0.o!` 0.o29 1.110 0.0S3
! 0.29 0.`S 0.02 0.00`
` 0.23 0.`S 0.001 0.000
o 0.23 0.`S 0.000 0.000
0.23 0.`S 0.000 0.000
S 0.23 0.`S 0.000 0.000
9 0.23 0.`S 0.000 0.000
10 0.23 0.`S
6
5
4
3
2
1
0
0 1 2 3 4
Gradicnt dc-ccnt with -ct-adajtin carnin ratc (c
+
1.2, c
0.`)
Other Extensions of Error Backpropagation
Flat Spot Elimination:
w(t)
w
e(t) +
Liminatc- -ow carnin in -aturation rcion ot oi-tic tunction
Countcract- thc dccay ot thc crror -ina- ovcr thc aycr-
Weight Decay:
w(t)
w
e(t) w(t),
Lcj- to imjrovc thc rou-tnc-- ot thc trainin rc-ut-
Can c dcrivcd trom an cxtcndcd crror tunction jcnaizin arc wciht-
e
e +

2
uU
out
U
hidden
_
2
u
+
pjrcd(u)
w
2
up
_
.
Sensitivity Analysis
Question: Low imjortant arc dicrcnt injut- to thc nctwork
Idea: Lctcrminc chanc ot outjut rcativc to chanc ot injut
u U
in
s(u)
1
[L
xcd
[
lL
xed
vU
out
out
(l)
v
cx
(l)
u
.
Iorma dcrivation Ajjy chain ruc
out
v
cx
u
out
v
out
u
out
u
cx
u
out
v
nct
v
nct
v
out
u
out
u
cx
u
.
Simjication A--umc that thc outjut tunction i- thc idcntity
out
u
cx
u
1.
Ior thc -ccond tactor wc ct thc cncra rc-ut
nct
v
out
u

out
u
pjrcd(v)
w
vp
out
p

pjrcd(v)
w
vp
out
p
out
u
.
Thi- cad- to thc rccur-ion tormua
out
v
out
u
out
v
nct
v
nct
v
out
u
out
v
nct
v
pjrcd(v)
w
vp
out
p
out
u
.
Lowcvcr, tor thc r-t hiddcn aycr wc ct
nct
v
out
u
w
vu
, thcrctorc
out
v
out
u
out
v
nct
v
w
vu
.
Thi- tormua mark- thc -tart ot thc rccur-ion
Con-idcr a- u-ua thc -jccia ca-c with
activation tunction i- oi-tic
Thc rccur-ion tormua i- in thi- ca-c
out
v
out
u
out
v
(1 out
v
)
pjrcd(v)
w
vp
out
p
out
u
and thc anchor ot thc rccur-ion i-
out
v
out
u
out
v
(1 out
v
)w
vu
.
Demonstration Software: xmlp/wmlp
Lcmon-tration ot mutiaycr jcrccjtron trainin
\i-uaization ot thc trainin jrocc--
Liimjication and Lxcu-ivc Or, two continuou- tunction-
httj,,wwworctnct,mjdhtm
Multilayer Perceptron Software: mlp/mlpgui
Sottwarc tor trainin cncra mutiaycr jcrccjtron-
Command inc vcr-ion writtcn in C, ta-t trainin
Grajhica u-cr intcrtacc in !ava, ca-y to u-c
httj,,wwworctnct,mjhtm, httj,,wwworctnct,mjuihtm
A radial basis function network (RBFN) i- a ncura nctwork
with a rajh G (U, C) that -ati-c- thc toowin condition-
(i) U
in
U
out
,
(ii) C (U
in
U
hiddcn
) C
/
, C
/
(U
hiddcn
U
out
)
Thc nctwork injut tunction ot cach hiddcn ncuron i- a distance function
ot thc injut vcctor and thc wciht vcctor, ic
u U
hiddcn
f
(u)
nct
( w
u
,

in
u
) d( w
u
,

in
u
),
whcrc d ll
n
ll
n
ll
+
0
i- a tunction -ati-tyin x, y, z ll
n
(i) d(x, y) 0 x y,
(ii) d(x, y) d(y, x) (-ymmctry),
(iii) d(x, z) d(x, y) + d(y, z) (trianc incquaity).
Distance Functions
Illustration of distance functions
d
k
(x, y)
_
_
n
i1
(x
i
y
i
)
k
_
_
1
k
\c-known -jccia ca-c- trom thi- tamiy arc
k 1 ànhattan or city ock di-tancc,
k 2 Lucidcan di-tancc,
k maximum di-tancc, ic d
(x, y) max
n
i1
[x
i
y
i
[
k 1 k 2 k
Thc nctwork injut tunction ot thc outjut ncuron- i- thc wcihtcd -um ot thcir
injut-, ic
u U
out
f
(u)
nct
( w
u
,

in
u
) w
u
in
u

vjrcd (u)
w
uv
out
v
.
Thc activation tunction ot cach hiddcn ncuron i- a -o-cacd radial function, ic
a monotonou-y dccrca-in tunction
f ll
+
0
|0, 1| with f(0) 1 and im
x
f(x) 0.
Thc activation tunction ot cach outjut ncuron i- a incar tunction, namcy
f
(u)
act
(nct
u
,
u
) nct
u
u
.
(Thc incar activation tunction i- imjortant tor thc initiaization)
Radial Activation Functions
rcctanc tunction
f
act
(nct, )
_
0, it nct > ,
1, othcrwi-c
net
0
1
trianc tunction
f
act
(nct, )
_
0, it nct > ,
1
net
, othcrwi-c
net
0
1
co-inc unti zcro

f
act
(nct, )
_
0, it nct > 2,
cos(

2
net)+1
2
, othcrwi-c
net
0
1
2
1
2
Gau--ian tunction
f
act
(nct, ) e
net
2
2
2
net
0
1
2
e
1
2
e
2
Radial Basis Function Networks: Examples
Radial basis function networks for the conjunction x
1
x
2
1
2
0
x
1
x
2
1
1
1
y
0 1
1
0
x
1
x
2
6
5
1
x
1
x
2
0
0
1
y
0 1
1
0
x
1
x
2
Radial Basis Function Networks: Examples
Radial basis function networks for the biimplication x
1
x
2
ldca oica dccomjo-ition
x
1
x
2
(x
1
x
2
) (x
1
x
2
)
1
2
1
2
0
x
1
x
2
1
1
0
0
1
1
y
0 1
1
0
x
1
x
2
Radial Basis Function Networks: Function Approximation
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
y
!
0
1
y
3
0
1
y
2
0
1
y
1
0
x
x
1
x
2
x
3
x
!
y
1
y
2
y
3
y
!
y

1
2
x
1
2
(x
i+1
x
i
)
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
0
1
0
1
0
1
3
3
3
3
3
3
y
!
3
3
3
3
3
3
y
3
3
3
3
3
3
3
y
2
3
3
3
3
3
3
y
1
x
y
2
1
0
1
2 ! o S
x
y
2
1
0
1
2 ! o S
0
1
w
1
0
1
w
2
0
1
w
3
Radial basis function network for a sum of three Gaussian functions
x
2
`
o
1
1
1
1
3
2
0
y
Training Radial Basis Function Networks
Radial Basis Function Networks: Initialization
Lct L
xcd
l
1
, . . . , l
m
c a xcd carnin ta-k,
con-i-tin ot m trainin jattcrn- l (
(l)
, o
(l)
)
Simple radial basis function network
Onc hiddcn ncuron v
k
, k 1, . . . , m, tor cach trainin jattcrn
k 1, . . . , m w
v
k

(l
k
)
.
lt thc activation tunction i- thc Gau--ian tunction,
thc radii
k
arc cho-cn hcuri-ticay
k 1, . . . , m
k

d
max
2m
,
whcrc
d
max
max
l
j
,l
k
L
xed
d
_
(l
j
)
,
(l
k
)
_
.
Initializing the connections from the hidden to the output neurons
u
m
k1
w
uv
m
out
(l)
v
m
u
o
(l)
u
or arcviatcd A w
u
o
u
,
whcrc o
u
(o
(l
1
)
u
, . . . , o
(l
m
)
u
)
i- thc vcctor ot dc-ircd outjut-,

u
0, and
A
_
_
_
_
_
_
_
_
out
(l
1
)
v
1
out
(l
1
)
v
2
. . . out
(l
1
)
v
m
out
(l
2
)
v
1
out
(l
2
)
v
2
. . . out
(l
2
)
v
m
out
(l
m
)
v
1
out
(l
m
)
v
2
. . . out
(l
m
)
v
m
_
_
_
_
_
_
_
_
.
Thi- i- a incar cquation -y-tcm, that can c -ovcd y invcrtin thc matrix A
w
u
A
1
o
u
.
RBFN Initialization: Example
Simple radial basis function network for the biimplication x
1
x
2
x
1
x
2
y
0 0 1
1 0 0
0 1 0
1 1 1
1
2
1
2
1
2
1
2
0
x
1
x
2
0
0
1
0
0
1
1
1
w
1
w
2
w
3
w
!
y
1
x
2
A
_
_
_
_
_
_
1 e
2
e
2
e
!
e
2
1 e
!
e
2
e
2
e
!
1 e
2
e
!
e
2
e
2
1
_
_
_
_
_
_
A
1
_
_
_
_
_
_
_
_
a
D
b
D
b
D
c
D
b
D
a
D
c
D
b
D
b
D
c
D
a
D
b
D
c
D
b
D
b
D
a
D
_
_
_
_
_
_
_
_
whcrc
D 1 !e
!
+ oe
S
!e
12
+ e
1o
0.92S
a 1 2e
!
+ e
S
0.9o3
b e
2
+ 2e
o
e
10
0.130!
c e
!
2e
S
+ e
12
0.01
w
u
A
1
o
u

1
D
_
_
_
_
_
_
a + c
2b
2b
a + c
_
_
_
_
_
_
_
_
_
_
_
_
1.0ò
0.2S09
0.2S09
1.0ò
_
_
_
_
_
_
1
x
2
-inc a-i- tunction
x
2
x
1
1
1
0
1
2
1
0
1
2
act
a a-i- tunction-
x
2
x
1
1
1
0
1
2
1
0
1
2
act
outjut
x
2
x
1
1
1
0
1
2
1
0
1
2
y
(1,0)
lnitiaization cad- arcady to a jcrtcct -oution ot thc carnin ta-k
Su-cqucnt trainin i- not nccc--ary
Normal radial basis function networks:
Sccct -u-ct ot k trainin jattcrn- a- ccntcr-
A
_
_
_
_
_
_
_
_
1 out
(l
1
)
v
1
out
(l
1
)
v
2
. . . out
(l
1
)
v
k
1 out
(l
2
)
v
1
out
(l
2
)
v
2
. . . out
(l
2
)
v
k
1 out
(l
m
)
v
1
out
(l
m
)
v
2
. . . out
(l
m
)
v
k
_
_
_
_
_
_
_
_
A w
u
o
u
Comjutc (òorclcnro-c) j-cudo invcr-c
A
+
(A
A)
1
A
.
Thc wciht- can thcn c comjutcd y
w
u
A
+
o
u
(A
A)
1
A
o
u
Normal radial basis function network for the biimplication x
1
x
2
Sccct two trainin jattcrn-
l
1
(
(l
1
)
, o
(l
1
)
) ((0, 0), (1))
l
!
(
(l
4
)
, o
(l
4
)
) ((1, 1), (1))
1
2
1
2
x
1
x
2
1
1
0
0
w
1
w
2
y
1
x
2
A
_
_
_
_
_
_
1 1 e
!
1 e
2
e
2
1 e
2
e
2
1 e
!
1
_
_
_
_
_
_
A
+
(A
A)
1
A

_
_
_
a b b a
c d d e
e d d c
_
_
_
whcrc
a 0.1S10, b 0.oS10,
c 1.1S1, d 0.ooSS, e 0.1`9!.
lc-utin wciht-
w
u

_
_
_
w
1
w
2
_
_
_ A
+
o
u

_
_
_
0.3o20
1.33`
1.33`
_
_
_.
1
x
2
a-i- tunction (0,0)
x
2
x
1
1
1
0
1
2
1
0
1
2
act
a-i- tunction (1,1)
x
2
x
1
1
1
0
1
2
1
0
1
2
act
outjut
y
1
0
0.36
(1,0)
lnitiaization cad- arcady to a jcrtcct -oution ot thc carnin ta-k
Thi- i- an accidcnt, ccau-c thc incar cquation -y-tcm i- not ovcr-dctcrmincd,
duc to incary dcjcndcnt cquation-
Finding appropriate centers for the radial basis functions
Onc ajjroach k-means clustering
Sccct randomy k trainin jattcrn- a- ccntcr-
A--in to cach ccntcr tho-c trainin jattcrn- that arc co-c-t to it
Comjutc ncw ccntcr- a- thc ccntcr ot ravity ot thc a--incd trainin jattcrn-
lcjcat jrcviou- two -tcj- unti convcrcncc,
ic, unti thc ccntcr- do not chanc anymorc
-c rc-utin ccntcr- tor thc wciht vcctor- ot thc hiddcn ncuron-
Atcrnativc ajjroach learning vector quantization
Radial Basis Function Networks: Training
Training radial basis function networks
Lcrivation ot ujdatc ruc- i- anaoou- to that ot mutiaycr jcrccjtron-
\ciht- trom thc hiddcn to thc outjut ncuron-
Gradicnt
w
u
e
(l)
u

e
(l)
u
w
u
2(o
(l)
u
out
(l)
u
)

in
(l)
u
,
\ciht ujdatc ruc
w
(l)
u

3
2
w
u
e
(l)
u

3
(o
(l)
u
out
(l)
u
)

in
(l)
u
(Two morc carnin ratc- arc nccdcd tor thc ccntcr coordinatc- and thc radii)
Ccntcr coordinatc- (wciht- trom thc injut to thc hiddcn ncuron-)
Gradicnt
w
v
e
(l)
e
(l)
w
v
2
s-ucc(v)
(o
(l)
s
out
(l)
s
)w
su
out
(l)
v
nct
(l)
v
nct
(l)
v
w
v
\ciht ujdatc ruc
w
(l)
v

1
2
w
v
e
(l)

1
s-ucc(v)
(o
(l)
s
out
(l)
s
)w
sv
out
(l)
v
nct
(l)
v
nct
(l)
v
w
v
Ccntcr coordinatc- (wciht- trom thc injut to thc hiddcn ncuron-)
Sjccia ca-c Euclidean distance
nct
(l)
v
w
v
_
_
n
i1
(w
vp
i
out
(l)
p
i
)
2
_
_
1
2
( w
v

in
(l)
v
).
Sjccia ca-c Gaussian activation function
out
(l)
v
nct
(l)
v
f
act
( nct
(l)
v
,
v
)
nct
(l)
v

nct
(l)
v
e
_
nct
(l)
v
_
2
2
2
v

nct
(l)
v
2
v
e
_
nct
(l)
v
_
2
2
2
v
.
ladii ot radia a-i- tunction-
Gradicnt
e
(l)
v
2
s-ucc(v)
(o
(l)
s
out
(l)
s
)w
su
out
(l)
v
v
.
\ciht ujdatc ruc
(l)
v

2
2
e
(l)
v

2
s-ucc(v)
(o
(l)
s
out
(l)
s
)w
sv
out
(l)
v
v
.
Sjccia ca-c Gaussian activation function
out
(l)
v
v
e
_
nct
(l)
v
_
2
2
2
v

_
nct
(l)
v
_
2
3
v
e
_
nct
(l)
v
_
2
2
2
v
.
Radial Basis Function Networks: Generalization
Generalization of the distance function
ldca -c ani-otrojic di-tancc tunction
Lxamjc Mahalanobis distance
d(x, y)
_
(x y)
1
(x y).
Lxamjc biimplication
1
3
0
x
1
x
2
1
2
1
2
1
y
=
_
9 8
8 9
_
0 1
1
0
x
1
x
2
Learning Vector Quantization
Vector Quantization
Voronoi diagram of a vector quantization
Lot- rcjrc-cnt vcctor- that arc u-cd tor quantizin thc arca
Linc- arc thc oundaric- ot thc rcion- ot joint-
that arc co-c-t to thc cnco-cd vcctor
Finding clusters in a given set of data points
Lata joint- arc rcjrc-cntcd y cmjty circc- ()
Cu-tcr ccntcr- arc rcjrc-cntcd y tu circc- ()
Learning Vector Quantization Networks
A learning vector quantization network (LVQ) i- a ncura nctwork
with a rajh G (U, C) that -ati-c- thc toowin condition-
(i) U
in
U
out
, U
hiddcn

(ii) C U
in
U
out
Thc nctwork injut tunction ot cach outjut ncuron i- a distance function
ot thc injut vcctor and thc wciht vcctor, ic
u U
out
f
(u)
nct
( w
u
,

in
u
) d( w
u
,

in
u
),
whcrc d ll
n
ll
n
ll
+
0
i- a tunction -ati-tyin x, y, z ll
n
(i) d(x, y) 0 x y,
(ii) d(x, y) d(y, x) (-ymmctry),
(iii) d(x, z) d(x, y) + d(y, z) (trianc incquaity).
Distance Functions
Illustration of distance functions
d
k
(x, y)
_
_
n
i1
(x
i
y
i
)
k
_
_
1
k
\c-known -jccia ca-c- trom thi- tamiy arc
k 1 ànhattan or city ock di-tancc,
k 2 Lucidcan di-tancc,
k maximum di-tancc, ic d
(x, y) max
n
i1
[x
i
y
i
[
k 1 k 2 k
Thc activation tunction ot cach outjut ncuron i- a -o-cacd radial function, ic
a monotonou-y dccrca-in tunction
f ll
+
0
|0, | with f(0) 1 and im
x
f(x) 0.
Somctimc- thc ranc ot vauc- i- rc-trictcd to thc intcrva |0, 1|
Lowcvcr, duc to thc -jccia outjut tunction thi- rc-triction i- irrccvant
Thc outjut tunction ot cach outjut ncuron i- not a -imjc tunction ot thc activation
ot thc ncuron lathcr it takc- into account thc activation- ot a outjut ncuron-
f
(u)
out
(act
u
)
_
_
_
1, it act
u
max
vU
out
act
v
,
0, othcrwi-c
lt morc than onc unit ha- thc maxima activation, onc i- -ccctcd at random to havc
an outjut ot 1, a othcr- arc -ct to outjut 0 winner-takes-all principle
Radial Activation Functions
rcctanc tunction
f
act
(nct, )
_
0, it nct > ,
1, othcrwi-c
net
0
1
trianc tunction
f
act
(nct, )
_
0, it nct > ,
1
net
, othcrwi-c
net
0
1
co-inc unti zcro

f
act
(nct, )
_
0, it nct > 2,
cos(

2
net)+1
2
, othcrwi-c
net
0
1
2
1
2
Gau--ian tunction
f
act
(nct, ) e
net
2
2
2
net
0
1
2
e
1
2
e
2
Adaptation of reference vectors / codebook vectors
Ior cach trainin jattcrn nd thc co-c-t rctcrcncc vcctor
Adajt ony thi- rctcrcncc vcctor (winncr ncuron)
Ior ca--icd data thc ca-- may c takcn into account
Lach rctcrcncc vcctor i- a--incd to a ca--
Attraction rule (data joint and rctcrcncc vcctor havc -amc ca--)
r
(ncw)
r
(od)
+ (x r
(od)
),
Repulsion rule (data joint and rctcrcncc vcctor havc dicrcnt ca--)
r
(ncw)
r
(od)
(x r
(od)
).
r
1
r
2
r
3
x
d
d
attraction ruc
r
1
r
2
r
3
x
d
d
rcju-ion ruc
x data joint, r
i
rctcrcncc vcctor
0.! (carnin ratc)
Learning Vector Quantization: Example
Lctt Oninc trainin with carnin ratc 0.1,
liht Latch trainin with carnin ratc 0.0`
Learning Vector Quantization: Learning Rate Decay
Problem: xed learning rate can lead to oscillations
Soution time dependent learning rate
(t)
0
t
, 0 < < 1, or (t)
0
t
, > 0.
Learning Vector Quantization: Classied Data
Improved update rule for classied data
Idea: jdatc not ony thc onc rctcrcncc vcctor that i- co-c-t to thc data joint
(thc winncr ncuron), ut update the two closest reference vectors
Lct x c thc currcnty jrocc--cd data joint and c it- ca--
Lct r
j
and r
k
c thc two co-c-t rctcrcncc vcctor- and z
j
and z
k
thcir ca--c-
lctcrcncc vcctor- arc ujdatcd ony it z
j
, z
k
and cithcr c z
j
or c z
k
(\ithout o-- ot cncraity wc a--umc c z

j
)
Thc update rules tor thc two co-c-t rctcrcncc vcctor- arc
r
(ncw)
j
r
(od)
j
+ (x r
(od)
j
) and
r
(ncw)
k
r
(od)
k
(x r
(od)
k
),
whic a othcr rctcrcncc vcctor- rcmain unchancd
Learning Vector Quantization: Window Rule
lt wa- o-crvcd in jractica tc-t- that -tandard carnin vcctor quantization
may drivc thc rctcrcncc vcctor- turthcr and turthcr ajart
To countcract thi- undc-ircd chavior a window rule wa- introduccd
ujdatc ony it thc data joint x i- co-c to thc ca--ication oundary
Co-c to thc oundary i- madc tormay jrcci-c y rcquirin
min
_
d(x, r
j
)
d(x, r
k
)
,
d(x, r
k
)
d(x, r
j
)
_
> , whcrc
1
1 +
.
i- a jaramctcr that ha- to c -jccicd y a u-cr
lntuitivcy, dc-cric- thc width ot thc window around thc ca--ication
oundary, in which thc data joint ha- to ic in ordcr to cad to an ujdatc
-in it jrcvcnt- divcrcncc, ccau-c thc ujdatc cca-c- tor a data joint oncc
thc ca--ication oundary ha- ccn movcd tar cnouh away
Soft Learning Vector Quantization
Idea: -c -ott a--inmcnt- in-tcad ot winncr-takc--a
Assumption: Givcn data wa- -amjcd trom a mixturc ot norma di-triution-
Lach rctcrcncc vcctor dc-cric- onc norma di-triution
Objective: àximizc thc o-ikcihood ratio ot thc data, that i-, maximizc
n L
ratio

n
j1
n
rR(c
j
)
cxj
_
_
(x
j
r)
(x
j
r)
2
2
_
_
j1
n
rQ(c
j
)
cxj
_
_
(x
j
r)
(x
j
r)
2
2
_
_
.
Lcrc i- a jaramctcr -jccityin thc -izc ot cach norma di-triution
R(c) i- thc -ct ot rctcrcncc vcctor- a--incd to ca-- c and Q(c) it- comjcmcnt
lntuitivcy at cach data joint thc jroaiity dcn-ity tor it- ca-- -houd c a- arc
a- jo--ic whic thc dcn-ity tor a othcr ca--c- -houd c a- -ma a- jo--ic
Soft Learning Vector Quantization
Update rule derived from a maximum log-likelihood approach:
r
(ncw)
i
r
(od)
i
+
_
_
u
ij
(x
j
r
(od)
i
), it c
j
z
i
,
u
ij
(x
j
r
(od)
i
), it c
j
, z
i
,
whcrc z
i
i- thc ca-- a--ociatcd with thc rctcrcncc vcctor r
i
and
u
ij

cxj (
1
2
2
(x
j
r
(od)
i
)
(x
j
r
(od)
i
))
rR(c
j
)
cxj (
1
2
2
(x
j
r
(od)
)
(x
j
r
(od)
))
and
u
ij

cxj (
1
2
2
(x
j
r
(od)
i
)
(x
j
r
(od)
i
))
rQ(c
j
)
cxj (
1
2
2
(x
j
r
(od)
)
(x
j
r
(od)
))
.
R(c) i- thc -ct ot rctcrcncc vcctor- a--incd to ca-- c and Q(c) it- comjcmcnt
Hard Learning Vector Quantization
Idea: Lcrivc a -chcmc with hard a--inmcnt- trom thc -ott vcr-ion
Approach: Lct thc -izc jaramctcr ot thc Gau--ian tunction o to zcro
Thc rc-utin ujdatc ruc i- in thi- ca-c
r
(ncw)
i
r
(od)
i
+
_
_
u
ij
(x
j
r
(od)
i
), it c
j
z
i
,
u
ij
(x
j
r
(od)
i
), it c
j
, z
i
,
whcrc
u
ij

_
_
1, it r
i
armin
rR(c
j
)
d(x
j
, r),
0, othcrwi-c,
u
ij

_
_
1, it r
i
armin
rQ(c
j
)
d(x
j
, r),
0, othcrwi-c
r
i
i- co-c-t vcctor ot -amc ca-- r
i
i- co-c-t vcctor ot dicrcnt ca--
Thi- ujdatc ruc i- -tac without a window rule rc-trictin thc ujdatc
Learning Vector Quantization: Extensions
Frequency Sensitive Competitive Learning
Thc di-tancc to a rctcrcncc vcctor i- modicd accordin to
thc numcr ot data joint- that arc a--incd to thi- rctcrcncc vcctor
Fuzzy Learning Vector Quantization
Lxjoit- thc co-c rcation-hij to tuzzy cu-tcrin
Can c -ccn a- an oninc vcr-ion ot tuzzy cu-tcrin
Lcad- to ta-tcr cu-tcrin
Size and Shape Parameters
A--ociatc cach rctcrcncc vcctor with a cu-tcr radiu-
jdatc thi- radiu- dcjcndin on how co-c thc data joint- arc
A--ociatc cach rctcrcncc vcctor with a covariancc matrix
jdatc thi- matrix dcjcndin on thc di-triution ot thc data joint-
Demonstration Software: xlvq/wlvq
Lcmon-tration ot carnin vcctor quantization
Aritrary data-ct-, ut trainin ony in two dimcn-ion-
httj,,wwworctnct,vqdhtm
A self-organizing map or Kohonen feature map i- a ncura nctwork with
a rajh G (U, C) that -ati-c- thc toowin condition-
(i) U
hiddcn
, U
in
U
out
,
(ii) C U
in
U
out
Thc nctwork injut tunction ot cach outjut ncuron i- a distance function ot

injut and wciht vcctor Thc activation tunction ot cach outjut ncuron i- a radial
function, ic a monotonou-y dccrca-in tunction
f ll
+
0
|0, 1| with f(0) 1 and im
x
f(x) 0.
Thc outjut tunction ot cach outjut ncuron i- thc idcntity
Thc outjut i- ottcn di-crctizcd accordin to thc winner takes all jrincijc
On thc outjut ncuron- a neighborhood relationship i- dcncd
d
ncuron-
U
out
U
out
ll
+
0
.
Self-Organizing Maps: Neighborhood
Neighborhood of the output neurons: neurons form a grid
quadratic rid hcxaona rid
Thin ack inc- lndicatc ncarc-t ncihor- ot a ncuron
Thick ray inc- lndicatc rcion- a--incd to a ncuron tor vi-uaization
Topology Preserving Mapping
Images of points close to each other in the original space
should be close to each other in the image space.
Lxamjc Robinson projection ot thc -urtacc ot a -jhcrc
E
loin-on jro,cction i- trcqucnty u-cd tor word maj-
Self-Organizing Maps: Neighborhood
Find topology preserving mapping by respecting the neighborhood
lctcrcncc vcctor ujdatc ruc
r
(ncw)
u
r
(od)
u
+ (t) f
n
(d
ncuron-
(u, u
), (t)) (x r
(od)
u
),
u
i- thc winncr ncuron (rctcrcncc vcctor co-c-t to data joint)

Thc tunction f
n
i- a radia tunction
Timc dcjcndcnt carnin ratc
(t)
0
, 0 <
< 1, or (t)
0
t
> 0.
Timc dcjcndcnt ncihorhood radiu-
(t)
0
, 0 <
< 1, or (t)
0
t
> 0.
Self-Organizing Maps: Examples
Example: ntodin ot a two-dimcn-iona -ct-oranizin maj
Trainin a -ct-oranizin maj may tai it
thc (initia) carnin ratc i- cho-cn too -ma or
or thc (initia) ncihor i- cho-cn too -ma
(a) () (c)
Sct-oranizin maj- that havc ccn traincd with random joint- trom
(a) a rotation jaraoa, () a -imjc cuic tunction, (c) thc -urtacc ot a -jhcrc
ln thi- ca-c oriina -jacc and imac -jacc havc dicrcnt dimcn-ionaity
Sct-oranizin maj- can c u-cd tor dimcn-ionaity rcduction
Demonstration Software: xsom/wsom
Lcmon-tration ot -ct-oranizin maj trainin
Two-dimcn-iona arca- and thrcc-dimcn-iona -urtacc-
httj,,wwworctnct,-omdhtm
Hopeld Networks
Hopeld Networks
A Hopeld network i- a ncura nctwork with a rajh G (U, C) that -ati-c-
thc toowin condition-
(i) U
hiddcn
, U
in
U
out
U,
(ii) C U U (u, u) [ u U
ln a Lojcd nctwork a ncuron- arc injut a- wc a- outjut ncuron-
Thcrc arc no hiddcn ncuron-
Lach ncuron rcccivc- injut trom a othcr ncuron-
A ncuron i- not conncctcd to it-ct
Thc conncction wciht- arc -ymmctric, ic
u, v U, u , v w
uv
w
vu
.
Hopeld Networks
Thc nctwork injut tunction ot cach ncuron i- thc wcihtcd -um ot thc outjut- ot
a othcr ncuron-, ic
u U f
(u)
nct
( w
u
,

in
u
) w
u
in
u

vUu
w
uv
out
v
.
Thc activation tunction ot cach ncuron i- a thrc-hod tunction, ic
u U f
(u)
act
(nct
u
,
u
)
_
1, it nct
u
,
1, othcrwi-c
Thc outjut tunction ot cach ncuron i- thc idcntity, ic
u U f
(u)
out
(act
u
) act
u
.
Hopeld Networks
Alternative activation function
u U f
(u)
act
(nct
u
,
u
, act
u
)
_
_
1, it nct
u
> ,
1, it nct
u
< ,
act
u
, it nct
u

Thi- activation tunction ha- advantac- wrt thc jhy-ica intcrjrctation
ot a Lojcd nctwork
General weight matrix of a Hopeld network
W
_
_
_
_
_
_
0 w
u
1
u
2
. . . w
u
1
u
n
w
u
1
u
2
0 . . . w
u
2
u
n
w
u
1
u
n
w
u
1
u
n
. . . 0
_
_
_
_
_
_
Hopeld Networks: Examples
Very simple Hopeld network
0
0
x
1
x
2
u
1
u
2
1 1
y
1
y
2
W
_
0 1
1 0
_
Thc chavior ot a Lojcd nctwork can dcjcnd on thc ujdatc ordcr
Comjutation- can o-ciatc it ncuron- arc ujdatcd in jarac
Comjutation- away- convcrc it ncuron- arc ujdatcd -cqucntiay
Parallel update of neuron activations
u
1
u
2
injut jha-c 1 1
work jha-c 1 1
1 1
1 1
1 1
1 1
1 1
Thc comjutation- o-ciatc, no -tac -tatc i- rcachcd
Outjut dcjcnd- on whcn thc comjutation- arc tcrminatcd
Sequential update of neuron activations
u
1
u
2
injut jha-c 1 1
work jha-c 1 1
1 1
1 1
1 1
u
1
u
2
injut jha-c 1 1
work jha-c 1 1
1 1
1 1
1 1
lcardc-- ot thc ujdatc ordcr a -tac -tatc i- rcachcd
\hich -tatc i- rcachcd dcjcnd- on thc ujdatc ordcr
Simplied representation of a Hopeld network
0
0
0
x
1
x
2
x
3
1 1
1 1 2
2
y
1
y
2
y
3
0
0
0
u
1
u
2
u
3
2
1
1
W
_
_
_
0 1 2
1 0 1
2 1 0
_
_
_
Symmctric conncction- ctwccn ncuron- arc comincd
lnjut- and outjut- arc not cxjicitcy rcjrc-cntcd
Hopeld Networks: State Graph
Graph of activation states and transitions
+++
++ ++ ++
+ + +
u
1
u
2
u
3
u
2
u
3
u
1
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
3
u
1
u
1
u
2
u
3
Hopeld Networks: Convergence
Convergence Theorem: lt thc activation- ot thc ncuron- ot a Lojcd nctwork
arc ujdatcd -cqucntiay (a-ynchronou-y), thcn a -tac -tatc i- rcachcd in a nitc
numcr ot -tcj-
lt thc ncuron- arc travcr-cd cycicay in an aritrary, ut xcd ordcr, at mo-t n 2
n
-tcj- (ujdatc- ot individua ncuron-) arc nccdcd, whcrc n i- thc numcr ot ncuron-
ot thc Lojcd nctwork
Thc jroot i- carricd out with thc hcj ot an energy function
Thc cncry tunction ot a Lojcd nctwork with n ncuron- u
1
, . . . , u
n
i-
E
1
2
act
W

act +

act

1
2
u,vU,u,v
w
uv
act
u
act
v
+
uU
u
act
u
.
Hopeld Networks: Convergence
Con-idcr thc cncry chanc rc-utin trom an ujdatc that chanc- an activation
E E
(ncw)
E
(od)
(
vUu
w
uv
act
(ncw)
u
act
v
+
u
act
(ncw)
u
)
(
vUu
w
uv
act
(od)
u
act
v
+
u
act
(od)
u
)
_
act
(od)
u
act
(ncw)
u
_
(
vUu
w
uv
act
v
. .
nct
u
u
).
nct
u
<
u
Sccond tactor i- c-- than 0
act
(ncw)
u
1 and act
(od)
u
1, thcrctorc r-t tactor rcatcr than 0
Result: E < 0
nct
u

u
Sccond tactor rcatcr than or cqua to 0
act
(ncw)
u
1 and act
(od)
u
1, thcrctorc r-t tactor c-- than 0
Result: E 0
Arrange states in state graph according to their energy
+ + ++ ++
++ +
+++ !
2
0
2
E
Lncry tunction tor cxamjc Lojcd nctwork
E act
u
1
act
u
2
2 act
u
1
act
u
3
act
u
2
act
u
3
.
The state graph need not be symmetric
1
1
1
u
1
u
2
u
3
2
2
2
+ +
++ ++
+ +++
++
1
1
3
`
E
Hopeld Networks: Physical Interpretation
Physical interpretation: Magnetism
A Lojcd nctwork can c -ccn a- a (micro-cojic) modc ot mancti-m
(-o-cacd l-in modc, |l-in 192`|)
jhy-ica ncura
atom ncuron
manctic momcnt (-jin) activation -tatc
-trcnth ot outcr manctic cd thrc-hod vauc
manctic coujin ot thc atom- conncction wciht-
Lamiton ojcrator ot thc manctic cd cncry tunction
Hopeld Networks: Associative Memory
Idea: Use stable states to store patterns
Iir-t Storc ony onc jattcrn x (act
(l)
u
1
, . . . , act
(l)
u
n
)
1, 1
n
, n 2,
ic, nd wciht-, -o that jattcrn i- a -tac -tatc
`ccc--ary and -ucicnt condition
S(Wx
) x,
whcrc
S ll
n
1, 1
n
,
x y
with
i 1, . . . , n y
i

_
1, it x
i
0,
1, othcrwi-c
lt

0 an ajjrojriatc matrix W can ca-iy c tound lt -ucc-

Wx cx with c ll
+
Acraicay Iind a matrix W that ha- a jo-itivc cicnvauc wrt x

Choo-c
W xx
T
E
whcrc xx
T
i- thc -o-cacd outer product
\ith thi- matrix wc havc
Wx (xx
T
)x Ex
..
x
()
x (x
T
x)
. .
[x[
2
n
x
nx x (n 1)x.
Hebbian learning rule |Lc 19!9|
\rittcn in individua wciht- thc comjutation ot thc wciht matrix rcad-
w
uv

_
_
0, it u v,
1, it u , v, act
(p)
u
act
(v)
u
,
1, othcrwi-c
Oriinay dcrivcd trom a iooica anaoy
Strcnthcn conncction ctwccn ncuron- that arc activc at thc -amc timc
òtc that thi- carnin ruc a-o -torc- thc comjcmcnt ot thc jattcrn
\ith Wx (n 1)x it i- a-o W(x ) (n 1)(x).
Storing several patterns
Choo-c
Wx
j

m
i1
W
i
x
j

_
_
m
i1
(x
i
x
T
i
)x
j
_
_
mEx
j
..
x
j
_
_
m
i1
x
i
(x
T
i
x
j
)
_
_
mx
j
lt jattcrn- arc orthoona, wc havc
x
T
i
x
j

_
0, it i , j,
n, it i j,
and thcrctorc
Wx
j
(n m)x
j
.
Storing several patterns
lc-ut A- on a- m < n, x i- a -tac -tatc ot thc Lojcd nctwork
òtc that thc comjcmcnt- ot thc jattcrn- arc a-o -torcd
\ith Wx
j
(n m)x
j
it i- a-o W(x
j
) (n m)(x
j
).
Lut Cajacity i- vcry -ma comjarcd to thc numcr ot jo--ic -tatc- (2
n
)
Non-orthogonal patterns:
Wx
j
(n m)x
j
+
m
i1
i,j
x
i
(x
T
i
x
j
)
. .
di-turancc tcrm
.
Associative Memory: Example
Lxamjc Storc jattcrn- x
1
(+1, +1, 1, 1)
and x
2
(1, +1, 1, +1)
W W
1
+ W
2
x
1
x
T
1
+ x
2
x
T
2
2E
whcrc
W
1

_
_
_
_
_
_
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
_
_
_
_
_
_
, W
2

_
_
_
_
_
_
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
_
_
_
_
_
_
.
Thc tu wciht matrix i-
W
_
_
_
_
_
_
0 0 0 2
0 0 2 0
0 2 0 0
2 0 0 0
_
_
_
_
_
_
.
Thcrctorc it i-
Wx
1
(+2, +2, 2, 2)
and Wx
1
(2, +2, 2, +2)
.
Associative Memory: Examples
Example: Storing bit maps of numbers
Lctt Lit maj- -torcd in a Lojcd nctwork
liht lccon-truction ot a jattcrn trom a random injut
Training a Hopeld network with the Delta rule
`ccc--ary condition tor jattcrn x cin a -tac -tatc
s(0 +w
u
1
u
2
act
(p)
u
2
+. . . +w
u
1
u
n
act
(p)
u
n

u
1
) act
(p)
u
1
,
s(w
u
2
u
1
act
(p)
u
1
+ 0 +. . . +w
u
2
u
n
act
(p)
u
n

u
2
) act
(p)
u
2
,
s(w
u
n
u
1
act
(p)
u
1
+w
u
n
u
2
act
(p)
u
2
+. . . + 0
u
n
) act
(p)
u
n
.
with thc -tandard thrc-hod tunction
s(x)
_
1, it x 0,
1, othcrwi-c
Training a Hopeld network with the Delta rule
Turn wciht matrix into a wciht vcctor
w ( w
u
1
u
2
, w
u
1
u
3
, . . . , w
u
1
u
n
,
w
u
2
u
3
, . . . , w
u
2
u
n
,
w
u
n1
u
n
,
u
1
,
u
2
, . . . ,
u
n
).
Con-truct injut vcctor- tor a thrc-hod oic unit
z
2
(act
(p)
u
1
, 0, . . . , 0,
. .
n 2 zcro-
act
(p)
u
3
, . . . , act
(p)
u
n
, . . . 0, 1, 0, . . . , 0
. .
n 2 zcro-
).
Ajjy Lcta ruc trainin unti convcrcncc
Demonstration Software: xhfn/whfn
Lcmon-tration ot Lojcd nctwork- a- a--ociativc mcmory
\i-uaization ot thc a--ociation,rcconition jrocc--
Two-dimcn-iona nctwork- ot aritrary -izc
httj,,wwworctnct,htndhtm
Hopeld Networks: Solving Optimization Problems
Use energy minimization to solve optimization problems
Gcncra jroccdurc
Tran-torm tunction to ojtimizc into a tunction to minimizc
Tran-torm tunction into thc torm ot an cncry tunction ot a Lojcd nctwork
lcad thc wciht- and thrc-hod vauc- trom thc cncry tunction
Con-truct thc corrc-jondin Lojcd nctwork
lnitiaizc Lojcd nctwork randomy and ujdatc unti convcrcncc
lcad -oution trom thc -tac -tatc rcachcd
lcjcat -cvcra timc- and u-c c-t -oution tound
Hopeld Networks: Activation Transformation
A Lojcd nctwork may c dcncd cithcr with activation- 1 and 1 or with acti-
vation- 0 and 1 Thc nctwork- can c tran-tormcd into cach othcr
Irom act
u
1, 1 to act
u
0, 1
w
0
uv
2w
uv
and
0
u

u
+
vUu
w
uv
Irom act
u
0, 1 to act
u
1, 1
w
uv

1
2
w
0
uv
and
u

0
u
1
2
vUu
w
0
uv
.
Combination lemma: Lct two Lojcd nctwork- on thc -amc -ct U ot ncuron-
with wciht- w
(i)
uv
, thrc-hod vauc-
(i)
u
and cncry tunction-
E
i

1
2
uU
vUu
w
(i)
uv
act
u
act
v
+
uU
(i)
u
act
u
,
i 1, 2, c ivcn Iurthcrmorc ct a, b ll Thcn E aE
1
+ bE
2
i- thc cncry
tunction ot thc Lojcd nctwork on thc ncuron- in U that ha- thc wciht- w
uv

aw
(1)
uv
+ bw
(2)
uv
and thc thrc-hod vauc-
u
a
(1)
u
+ b
(2)
u

lroot !u-t do thc comjutation-
ldca Additiona condition- can c tormaizcd -cjaratcy and incorjoratcd atcr
Example: Traveling salesman problem
ldca lcjrc-cnt tour y a matrix
1
3 !
2
city
1 2 3 !
_
_
_
_
_
_
1 0 0 0
0 0 1 0
0 0 0 1
0 1 0 0
_
_
_
_
_
_
1.
2.
3.
!.
-tcj
An ccmcnt a
ij
ot thc matrix i- 1 it thc i-th city i- vi-itcd in thc j-th -tcj and 0
othcrwi-c
Lach matrix ccmcnt wi c rcjrc-cntcd y a ncuron
Minimization of the tour length
E
1

n
j
1
1
n
j
2
1
n
i1
d
j
1
j
2
m
ij
1
m
(i mod n)+1,j
2
.
Louc -ummation ovcr -tcj- (indcx i) nccdcd
E
1

(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
d
j
1
j
2

(i
1
mod n)+1,i
2
m
i
1
j
1
m
i
2
j
2
,
whcrc
ab

_
1, it a b,
0, othcrwi-c
Symmctric vcr-ion ot thc cncry tunction
E
1

1
2
(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
d
j
1
j
2
(
(i
1
mod n)+1,i
2
+
i
1
,(i
2
mod n)+1
) m
i
1
j
1
m
i
2
j
2
.
Additiona condition- that havc to c -ati-cd
Lach city i- vi-itcd on cxacty onc -tcj ot thc tour
j 1, . . . , n
n
i1
m
ij
1,
ic, cach coumn ot thc matrix contain- cxacty onc 1
On cach -tcj ot thc tour cxacty onc city i- vi-itcd
i 1, . . . , n
n
j1
m
ij
1,
ic, cach row ot thc matrix contain- cxacty onc 1
Thc-c condition- arc incorjoratcd y ndin additiona tunction- to ojtimizc
Iormaization ot r-t condition a- a minimization jrocm
E
2

n
j1
_
_
_
_
_
n
i1
m
ij
_
_
2
2
n
i1
m
ij
+ 1
_
_
_
j1
_
_
_
_
n
i
1
1
m
i
1
j
_
_
_
_
n
i
2
1
m
i
2
j
_
_
2
n
i1
m
ij
+ 1
_
_
j1
n
i
1
1
n
i
2
1
m
i
1
j
m
i
2
j
2
n
j1
n
i1
m
ij
+ n.
Louc -ummation ovcr citic- (indcx i) nccdcd
E
2

(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
j
1
j
2
m
i
1
j
1
m
i
2
j
2
2
(i,j)1,...,n
2
m
ij
.
lc-utin cncry tunction
E
2

1
2
(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
2
j
1
j
2
m
i
1
j
1
m
i
2
j
2
+
(i,j)1,...,n
2
2m
ij
Sccond additiona condition i- handcd in a comjctcy anaoou- way
E
3

1
2
(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
2
i
1
i
2
m
i
1
j
1
m
i
2
j
2
+
(i,j)1,...,n
2
2m
ij
.
Cominin thc cncry tunction-
E aE
1
+ bE
2
+ cE
3
whcrc
b
a

c
a
> 2 max
(j
1
,j
2
)1,...,n
2
d
j
1
j
2
.
Irom thc rc-utin cncry tunction wc can rcad thc wciht-
w
(i
1
,j
1
)(i
2
,j
2
)
ad
j
1
j
2
(
(i
1
mod n)+1,i
2
+
i
1
,(i
2
mod n)+1
)
. .
trom E
1
2b
j
1
j
2
. .
trom E
2
2c
i
1
i
2
. .
trom E
3
and thc thrc-hod vauc-
(i,j)
0a
..
trom E
1
2b
..
trom E
2
2c
..
trom E
3
2(b + c).
lrocm landom initiaization and ujdatc unti convcrcncc not away- cad- to
a matrix that rcjrc-cnt- a tour, cavc aonc an ojtima onc
Recurrent Neural Networks
Recurrent Networks: Cooling Law
A ody ot tcmjcraturc
0
that i- jaccd into an cnvironmcnt with tcmjcraturc
A
Thc cooin,hcatin ot thc ody can c dc-cricd y Newtons cooling law

d
dt

k(
A
).
Lxact anaytica -oution
(t)
A
+ (
0
A
)e
k(tt
0
)
Ajjroximatc -oution with Euler-Cauchy polygon courses
1
(t
1
) (t
0
) +

(t
0
)t
0
k(
0
A
)t.
2
(t
2
) (t
1
) +

(t
1
)t
1
k(
1
A
)t.
Gcncra rccur-ivc tormua
i
(t
i
) (t
i1
) +

(t
i1
)t
i1
k(
i1
A
)t
EulerCauchy polygon courses for dierent step widths
t
0
0 ` 10 1` 20
t
0
0 ` 10 1` 20
t
0
0 ` 10 1` 20
t ! t 2 t 1
Thc thin curvc i- thc cxact anaytica -oution
lccurrcnt ncura nctwork
(t
0
) (t) k
A
t
kt
òrc torma dcrivation ot thc rccur-ivc tormua
lcjacc dicrcntia quoticnt y forward dierence
d(t)
dt

(t)
t

(t + t) (t)
t
with -ucicnty -ma t Thcn it i-
(t + t) (t) (t) k((t)
A
)t,
(t + t) (t) (t) kt(t) + k
A
t
and thcrctorc
i

i1
kt
i1
+ k
A
t.
Recurrent Networks: Mass on a Spring
m
x
0
Govcrnin jhy-ica aw-
Hookes law F cl cx (c i- a -jrin dcjcndcnt con-tant)
Newtons second law F ma m x (torcc cau-c- an accccration)
lc-utin dicrcntia cquation
m x cx or x
c
m
x.
Gcncra anaytica -oution ot thc dicrcntia cquation
x(t) a -in(t) + b co-(t)
with thc jaramctcr-

_
c
m
,
a x(t
0
) -in(t
0
) + v(t
0
) co-(t
0
),
b x(t
0
) co-(t
0
) v(t
0
) -in(t
0
).
\ith ivcn initia vauc- x(t
0
) x
0
and v(t
0
) 0 and
thc additiona a--umjtion t
0
0 wc ct thc -imjc cxjrc--ion
x(t) x
0
co-
__
c
m
t
_
.
Turn dicrcntia cquation into two coujcd cquation-
x v and v
c
m
x.
Ajjroximatc dicrcntia quoticnt y torward dicrcncc
x
t

x(t + t) x(t)
t
v and
v
t

v(t + t) v(t)
t

c
m
x
lc-utin rccur-ivc cquation-
x(t
i
) x(t
i1
) + x(t
i1
) x(t
i1
) + t v(t
i1
) and
v(t
i
) v(t
i1
) + v(t
i1
) v(t
i1
)
c
m
t x(t
i1
).
0
0
x(t
0
)
v(t
0
)
x(t)
v(t)
t
c
m
t
u
2
u
1
`curon u
1
f
(u
1
)
nct
(v, w
u
1
u
2
) w
u
1
u
2
v
c
m
t v and
f
(u
1
)
act
(act
u
1
, nct
u
1
,
u
1
) act
u
1
+nct
u
1
u
1
,
`curon u
2
f
(u
2
)
nct
(x, w
u
2
u
1
) w
u
2
u
1
x t x and
f
(u
2
)
act
(act
u
2
, nct
u
2
,
u
2
) act
u
2
+nct
u
2
u
2
.
Somc comjutation -tcj- ot thc ncura nctwork
t v x
0.0 0.0000 1.0000
0.1 0.`000 0.9`00
0.2 0.9`0 0.S`2`
0.3 1.!012 0.12!
0.! 1.`! 0.`3oo
0.` 2.02`S 0.33!1
0.o 2.192S 0.11!S
x
t
1 2 3 4
Thc rc-utin curvc i- co-c to thc anaytica -oution
Thc ajjroximation ct- cttcr with -macr -tcj width
Recurrent Networks: Dierential Equations
General representation of explicit n-th order dierential equation:
x
(n)
f(t, x, x, x, . . . , x
(n1)
)
lntroducc n 1 intcrmcdiary quantitic-
y
1
x, y
2
x, . . . y
n1
x
(n1)
to otain thc -y-tcm
x y
1
,
y
1
y
2
,
y
n2
y
n1
,
y
n1
f(t, x, y
1
, y
2
, . . . , y
n1
)
ot n coujcd r-t ordcr dicrcntia cquation-
lcjacc dicrcntia quoticnt y torward di-tancc to otain thc rccur-ivc cquation-
x(t
i
) x(t
i1
) + t y
1
(t
i1
),
y
1
(t
i
) y
1
(t
i1
) + t y
2
(t
i1
),
y
n2
(t
i
) y
n2
(t
i1
) + t y
n3
(t
i1
),
y
n1
(t
i
) y
n1
(t
i1
) + f(t
i1
, x(t
i1
), y
1
(t
i1
), . . . , y
n1
(t
i1
))
Lach ot thc-c cquation- dc-cric- thc ujdatc ot onc ncuron
Thc a-t ncuron nccd- a -jccia activation tunction
x
0
x
0
x
0
x
(n1)
0
t
0
0
0
0
t
x(t)
t
t
t
t
Recurrent Networks: Diagonal Throw
y
x
y
0
x
0
v
0
co-
v
0
-in Liaona throw ot a ody
Two dicrcntia cquation- (onc tor cach coordinatc)
x 0 and y g,
whcrc g 9.S1 m-
2
lnitia condition- x(t

0
) x
0
, y(t
0
) y
0
, x(t
0
) v
0
co- and y(t
0
) v
0
-in
lntroducc intcrmcdiary quantitic-
v
x
x and v
y
y
to rcach thc -y-tcm ot dicrcntia cquation-
x v
x
, v
x
0,
y v
y
, v
y
g,
trom which wc ct thc -y-tcm ot rccur-ivc ujdatc tormuac
x(t
i
) x(t
i1
) + t v
x
(t
i1
), v
x
(t
i
) v
x
(t
i1
),
y(t
i
) y(t
i1
) + t v
y
(t
i1
), v
y
(t
i
) v
y
(t
i1
) t g.
Lcttcr dc-crijtion -c vectors a- injut- and outjut-
r ge
y
,
whcrc e
y
(0, 1)
lnitia condition- arc r(t
0
) r
0
(x
0
, y
0
) and

r(t
0
) v
0
(v
0
co- , v
0
-in )
lntroducc onc vector-valued intcrmcdiary quantity v

r to otain
r v,

v ge
y
Thi- cad- to thc rccur-ivc ujdatc ruc-
r(t
i
) r(t
i1
) + t v(t
i1
),
v(t
i
) v(t
i1
) t ge
y
Advantac ot vcctor nctwork- ccomc- oviou- it triction i- takcn into account
a v
r
i- a con-tant that dcjcnd- on thc -izc and thc -hajc ot thc ody
Thi- cad- to thc dicrcntia cquation
r ge
y
.
lntroducc thc intcrmcdiary quantity v

r to otain
r v,

v v ge
y
,
trom which wc otain thc rccur-ivc ujdatc tormuac
r(t
i
) r(t
i1
) + t v(t
i1
),
v(t
i
) v(t
i1
) t v(t
i1
) t ge
y
.
lc-utin rccurrcnt ncura nctwork
r
0
v
0
0
tge
y
t
r(t)
t
x
y
1 2 3
Thcrc arc no -tranc coujin- a- thcrc woud c in a non-vcctor nctwork
òtc thc dcviation trom a jaraoa that i- duc to thc triction
Recurrent Networks: Planet Orbit
r m
r
[r [
3
,

r v,

v m
r
[r [
3
.
lccur-ivc ujdatc ruc-
r(t
i
) r(t
i1
) + t v(t
i1
),
v(t
i
) v(t
i1
) t m
r(t
i1
)
[r(t
i1
)[
3
,
r
0
v
0
0
x(t)
v(t)
t mt
x
y
1 0.5 0 0.5
0.5
Recurrent Networks: Backpropagation through Time
ldca ntod thc nctwork ctwccn trainin jattcrn-,
ic, crcatc onc ncuron tor cach joint in timc
Lxamjc Newtons cooling law
(t
0
)

(t)
1kt 1kt 1kt 1kt
ntodin into tour -tcj- lt i- k
A
t
Trainin i- -tandard ackjrojaation on untodcd nctwork
A ujdatc- rctcr to thc -amc wciht
ujdatc- arc carricd out attcr r-t ncuron i- rcachcd

NN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NN

Uploaded by

Copyright:

Available Formats

Introduction to Neural Networks

Christian Borgelt Introduction to Neural Networks 12

Christian Borgelt Introduction to Neural Networks 13

Lach injut ncuron u U

Iurthcrmorc, cach ncuron u U jo--c--c- thrcc tunction-

Christian Borgelt Introduction to Neural Networks 48

Iccd-torward nctwork with -tricty aycrcd -tructurc

Advantac Thc comjutation ot thc nctwork injut can c writtcn a-

Christian Borgelt Introduction to Neural Networks 58

i- thc activation vcctor,

i- thc outjut vcctor,

Christian Borgelt Introduction to Neural Networks 63

Thc-c two comjutation- can c comincd into

Sy-tcm can c -ovcd with -tandard mcthod- trom incar acra

X i- not -inuar Thcn wc havc

i- cacd thc (`oorc-lcnro-c-)Pseudoinverse ot thc matrix X

Christian Borgelt Introduction to Neural Networks 87

Thc r-t a--umjtion yicd-

|0.`, 0.| and c

co-inc unti zcro

i- thc vcctor ot dc-ircd outjut-,

co-inc unti zcro

(\ithout o-- ot cncraity wc a--umc c z

Thc nctwork injut tunction ot cach outjut ncuron i- a distance function ot

i- thc winncr ncuron (rctcrcncc vcctor co-c-t to data joint)

0 an ajjrojriatc matrix W can ca-iy c tound lt -ucc-

Acraicay Iind a matrix W that ha- a jo-itivc cicnvauc wrt x

Thc cooin,hcatin ot thc ody can c dc-cricd y Newtons cooling law

lnitia condition- x(t

You might also like