You are on page 1of 21

1998 Special Issue

Neural network based control schemes for exible-link manipulators:


simulations and experiments
H.A. Talebi, K. Khorasani*, R.V. Patel
Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, Canada, H3G 1M8
Received 27 October 1997; accepted 11 March 1998
Abstract
This paper presents simulation and experimental results on the performance of neural network-based controllers for tip position tracking
of exible-link manipulators. The controllers are designed by utilizing the modied output re-denition approach. The modied output
re-denition approach requires only a priori knowledge about the linear model of the system and no a priori knowledge about the payload
mass. Four different neural network schemes are proposed. The rst two schemes are developed by using a modied version of the feedback-
error-learning approach to learn the inverse dynamics of the exible manipulator. Both schemes require only a linear model of the system for
dening the new outputs and for designing conventional PD-type controllers. This assumption is relaxed in the third and fourth schemes. In
the third scheme, the controller is designed based on tracking the hub position while controlling the elastic deection at the tip. In the fourth
scheme which employs two neural networks, the rst network (referred to as the output neural network) is responsible for specifying an
appropriate output for ensuring minimum phase behavior of the system. The second neural network is responsible for implementing an
inverse dynamics controller. The performance of the four proposed neural network controllers is illustrated by simulation results for a two-
link planar exible manipulator and by experimental results for a single exible-link test-bed. The networks are all trained and employed as
online controllers and no off-line training is required. 1998 Elsevier Science Ltd. All rights reserved.
Keywords: Flexible-link manipulators; Inverse dynamics; Neural network based control; Nonlinear control; Non-minimum phase systems;
Output redenition; Tracking control
1. Introduction
Interest in controlling exible-link manipulators has been
growing over the past several years. The potential advan-
tages that arise from the use of light-weight exible-link
manipulators are faster operation, lower energy consump-
tion, and less costly structures. However, end-point control
for a exible-link manipulator is considerably more difcult
than for a rigid-link manipulator. Coupling effects, non-
linearities, parameter variations and unmodeled dynamics
all contribute to this difculty. Control strategies that ignore
these uncertainties and nonlinearities generally fail to
provide satisfactory closed-loop performance.
The most common approach to compensate for the non-
linear dynamics of a rigid manipulator is the so-called
inverse dynamics strategy. However, the extension of this
approach to exible-link manipulators is impeded by the
non-minimum phase characteristics of the arm when the
tip position is taken as the output of the system. A noncausal
torque solution to the above problem was rst proposed by
Bayo & Moulin (1989) and Kwon & Book (1990). Siciliano
& Book (1988) and Schoenwald & Ozguner (1990) applied
singular perturbation theory for modeling and control of
exible-link robots. The integral manifold approach was
also used to control exible-link manipulators (Hashtrudi-
Zaad & Khorasani, 1996; Moallem et al., 1997b). Geniele
et al. (1992) proposed a novel approach based on trans-
mission zero assignment. Wang & Vidyasagar (1989)
re-dened the output of the nonlinear system as the
reected tip position to ensure stable zero dynamics for
the resulting input-output map. A re-dened output on the
link between the joint and the tip was suggested (De Luca &
Siciliano, 1989; Madhavan & Singh, 1991). The new output
is dened so that the zero dynamics related to this output are
stable. However, all of these methods assume exact knowl-
edge of the dynamics and the nonlinearities of the exible-
link manipulator. Since, in general, it is very difcult to
model a exible-link manipulator accurately, the per-
formance of these control strategies may be unsatisfactory
for real applications. * Corresponding author. E-mail: kash@ece.concordia.ca
08936080/98/$19.00 1998 Elsevier Science Ltd. All rights reserved.
PII: S0893-6080( 98) 00038- 0
Neural Networks 11 (1998) 13571377 PERGAMON
Neural
Networks
An approach that looks promising for the control of
exible-link manipulators is intelligent control, such as
that based on neural networks. Several methods have been
proposed in the literature for neural network based con-
trollers (Miyamoto et al., 1988; Gomi & Kawato, 1993)
that can be used to adaptively control rigid manipulators.
Most of these results, however, are based on either
minimum phase characteristics of the input-output map of
the system or require full state measurementsconditions
that are generally not satised for exible-link
manipulators.
Cheng & Wen (1993) developed a neural network con-
troller for a exible-link manipulator. Hub position and
velocity measurements were used to stabilize the system
and a neural observer/controller was proposed to drive the
exible arm to track a desired trajectory. This work, how-
ever, is restricted only to linear models of the exible-link
robot. Donne & Ozguner (1994) proposed a neural con-
troller assuming partial knowledge of the dynamics of the
exible-link. The unknown part of the dynamics is identi-
ed by a supervised learning algorithm. The control is
constructed in two stages, an optimal controller and an
unsupervised neural network controller using model-based
predictive control. The scheme is based on an identication
stage that also requires feedback from the states of the
system. Newton & Xu (1993) considered the joint tracking
control problem for a space manipulator using feedback-
error-learning technique. However, tip position tracking
cannot be guaranteed specially for high-speed desired
trajectories.
Finally, Talebi et al. (1997) used a linear model of the
system to dene a new output and design a PD controller.
The feedback-error-learning method was used to develop
a neural network controller to learn the inverse dynamics of
the exible-link system.
The outline of this paper is as follows. In Section 2, the
nonlinear model of the manipulator is described. In
Section 3, the output re-denition approach is discussed.
In Section 4, the proposed four neural network based
schemes are described. The rst two structures use the
feedback-error-learning scheme to learn the inverse
dynamics corresponding to the new output dened in
Section 3. The third structure is designed to drive the hub
position to track a desired trajectory and to control the
elastic deections at the tip. The fourth neural network
structure consists of two neural networks. One network is
trained as an online feedback controller, and the other is
trained to determine an appropriate output for feedbdck
(in the sense of ensuring minimum phase behavior of the
system). The third and fourth schemes do not require any a
priori information about the linear dynamics of the system.
Simulation results for a two-link planar manipulator are also
included. Finally, in Section 6 experimental results are pre-
sented for a highly exible link manipulator that utilizes the
four proposed neural network based controllers.
2. Manipulator model
Modeling of a exible-link manipulator is difcult due to
the distributed parameter nature of the system. This means
that in general a large number of states is needed to obtain
an accurate model of the system. The assumed modes and
nite elements are two common methods used to obtain an
approximate model of a exible-link manipulator. The
schematic of a single exiblelink manipulator is shown in
Fig. 1. The manipulator is xed at one end (hub) and is
driven by a torque t. The other end is free to ex in a
horizontal plane, and has a mass M
l
as a payload.
The model in this paper is derived by using the Recursive
Lagrangian approach (Book, 1984). The dynamic equations
of motion for a exible-link manipulator may be expressed
Fig. 1. Schematic of the one link exible arm.
1358 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
as follows:
M(v, d)

d
_ _

f
1
(v,

v) h
1
(v,

v, d,

d) F
1

v f
c
f
2
(v,

v) h
2
(v,

v, d,

d) Kd F
2

d
_ _

u
0
_ _
(1)
where v is the n 1 vector of joint variables, d is the m 1
vector of deection variables and f
1
, f
2
, h
1
and h
2
are the
terms due to gravity ( f
1
only), Coriolis, and centripetal
forces; M is the positive-denite mass matrix, K is the
positive-denite diagonal stiffness matrix, F
1
and F
2
are
the viscous damping at the hub and positive-denite
diagonal matrix of structural damping respectively, u is
the input torque, and f
c
is the Coulomb friction at the hub.
If a nite number of exible modes m
i
is considered for the
ith link, then we have d [d
1
d
i
]
T
, i 1,, n,
d
i
[d
i1
d
im
i
] and m

n
i 1
m
i
:
As pointed out by Wang & Vidyasagar (1989), the
tip position can be obtained from y
ti
v
i

W
i
(l
i
, t)
l
i
where
W
i
l
i
, t) is the elastic deection at the tip and l
i
is the length
of the ith link. When m
i
modes are considered, W
i
(l
i
, t) may
be expressed as W
i
(l
i
, t)

m
i
j 1
f
ij
(l
i
)d
ij
(t), where f
ij
is the
jth eigenfunction of the ith link and d
ij
is the jth mode of the
ith link. Thus, the tip position vector can be expressed as
y
t
v w
nm
d
where
w
nm

v
T
1
0

0
0 v
T
2

0

0 0

v
T
n
_

_
_

_
, v [v
1

v
n
]
T
, 2
v
T
i

1
l
i
f
i1e

f
im
i
e
; i 1; ; n;
f
ije
f
ij
l
i
; i 1; ; n; j 1; ; m
i
;
y
T
t
y
1

y
n

3. Re-denition of the output


It is well-known that the zero dynamics of a exible-link
manipulator associated with the tip position are unstable. In
other words, the system is non-minimum phase and is very
difcult to control using tip position output for feedback.
Wang & Vidyasagar (1989) proposed the reected tip
position (RTP) as y
ri
v
i

W
i
(l
i
, t)
l
i
where it is shown that
the zero dynamics related to this output are stable (i.e. the
system is minimum phase) and, consequently, the system
can be stabilized by a PD-type control law. The main
advantage of using the reected tip position over the joint
position for control is that by using the latter the vibrations
of the system cannot be controlled accurately and, conse-
quently, the only damping experienced by the system is its
natural damping. Therefore, the vibrations of the elastic
modes take a long time to die out resulting in considerable
oscillations at the tip. Note that as the speed of the reference
tip trajectory is increased, the unmodeled high frequency
exible modes will get further excited. Given that
y
ti
v
i

W
i
(l
i
, t)
l
i
and y
ri
v
i

W
i
(l
i
, t)
l
i
, therefore, the differ-
ence between the reected tip position (RTP) and the actual
tip position becomes signicant for high-speed reference
trajectories as well as for highly exible-link manipulators.
Hence, acceptable actual tip tracking performance cannot be
ensured by only controlling the RTP. Consequently, for a
very exible manipulator instead of the RTP, a different
output should be dened for designing the control law.
Alternatively, Madhavan & Singh (1991) proposed to
choose the sum of the joint angle and a scaling of the tip
elastic deformation as the output for control, namely,
y
ai
v
i
a
i
W
i
(l
i
, t)
l
i
, where 1 a
i
1. For the choice
of a
i
1, the output becomes the tip angular position; for
a
i
0 the output becomes the joint angle; and for a
i
1
the output becomes the RTP.
It was shown (Madhavan & Singh, 1991) that a critical
value a

i
, 0 a

i
1, exists such that the zero dynamics
related to the new output y
a
are unstable for all a
i
a

i
and are stable for 1 a
i
a

i
. Hence, an inverse
dynamics controller can be designed to control the system
output for 1 a
i
a

i
. It was also observed that for a
i
negative the response is more oscillatory than for a
i
posi-
tive. Our objective in this section is to showthat by using the
new output y
a
, the dynamics of the exible-link manipulator
may be expressed in such a way that the feedback-error-
learning method is applicable for controlling the system.
Consider the dynamics of the manipulator given by
Eq. (1) and dene
H(v, d) M
1
(v, d)
H
11
H
12
H
21
H
22
_ _
:
Then Eq. (1) can be re-written as

d
_ _
H(v, d)
u f
1
(v,

v) h
1
(v,

v, d,

d) F
1

v f
c
f
2
(v,

v) h
2
(v,

v, d,

d) Kd F
2

d
_ _
(3)
Dening W
nm
Uw, where Udiag{a
1
, , a
n
} and w
is given by Eq. (2), the new output can be expressed as
y
a
v W
nm
d. Now, consider system (1) with the output
dened above. To nd the external dynamics related to this
new output, successive time differentiation of y
a
has to be
performed until the input appears explicitly, namely we
have
y
a


v W
nm

d (4)
Using Eq. (3) and Eq. (4), it follows that
y
a
A(v,

v, d,

d) B(v, d)u (5)
where
B(v, d) H
11
W
nm
H
21
1359 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
and
A(v,

v, d,

d) (H
11
WH
21
)(f
1
h
1
F
1

v f
c
)
(H
12
WH
22
)(f
2
h
2
Kd F
2

d)
The external dynamics related to the new output can be
written in general as
u f (v,

v, d,

d, y
a
) (6)
Zero dynamics by the denition (Isidori & Moog, 1987) are
the dynamics which are left in the system once the input is
chosen in such a way that it constrains the output to remain
identically at zero. This input can be obtained from Eq. (5) as
u B
1
(v, d)[ A(v,

v, d,

d)]
The zero dynamics of the system may now be expressed as

d P[f
2
(w1, w2) h
2
(w1, w2, w3, w4) Kd F
2

d] (7)
where
w1 Wd, w2 W

d, w3 d, w4

d
and P is given by
P [H
22
H
21
(H
11
WH
21
)
1
(H
12
WH
22
)]l
(w1, w3)
(8)
At this stage, in principle by linearizing the zero dynamics
around the equilibrium point, one can nd the value of a
*
.
However, since the mass matrix M (and hence H and P)
depends on the payload mass M
l
, a
*
also depends on the
payload mass. Consequently, to obtain the exact value of a
*
,
the value of M
l
should be known as a priori. However, the
control schemes proposed in this paper assume no a priori
knowledge about the payload mass M
l
. In Section 3.1, it will
be shown that the dependence of a
*
on payload mass M
l
is
such that the value of a
*
takes its, lowest (conservative) value
when M
l
is zero. In other words, the value of a
*
obtained for
zero payload mass guarantees stability of the zero dynamics
as M
l
increases. This choice of a
*
is conservative, since as M
l
increases, larger values of a
*
can be used.
Now, by neglecting the payload (for the purpose of
specifying the output re-denition only), and by linearizing
Eq. (7), Eq. (7) and (8) above become

d P
0
[Kd F
2

d],
P
0
[H
22
H
21
(H
11
WH
21
)
1
(H
12
WH
22
)]l
(0, 0)
, 9
where P
0
is the linearized P evaluated at M
l
0. Suppose
that the vector a and the matrices H, K and F
2
are such that
A(a)
0 I
P
0
K P
0
F
_ _
is Hurwitz. Then the origin of Eq. (9), and hence Eq. (7) is
locally exponentially stable and the original nonlinear system
Eq. (3) is locally minimum phase (Slotine & Li, 1991). Con-
sequently, provided that the linearized mass matrix M (with
zero payload), the stiffness matrix K, and the structural damp-
ing matrix F are known, then a proper output may be specied
by obtaining a such that A(a) is guaranteed to be Hurwitz.
3.1. Variation of a
*
with the payload
In the previous section, an output re-denition approach
for exible-link manipulators has been presented based on
zero payload mass. In the following, it is shown that using
the value of a
*
obtained for M
l
0 ensures stability of the
zero dynamics as the payload mass is increased.
Towards this end, consider the dynamics of a single
exible-link manipulator when one exible mode is con-
sidered. The dynamic equations of the manipulator are
given by
m
11

v m
12

d F
1

v u
m
12

v m
22

d F
2

d Kd 0
10
It is assumed that there is an a
*
a
0
, which ensures
stability of the zero dynamics associated with the new out-
put for M
l
0. When nonzero payload is considered in the
model, m
ij
, i, j {1, 2} can be obtained (De Luca &
Siciliano, 1989) as
m
11
J
0
I
h
M
l
l
2
, m
12
rA

l
0
f
1
(x)xdxM
l
lf
1e
,
m
22
rAM
l
f
2
1e
11
Now, dening the new output as
y
a
v a
0
f
1e
l
d,
where a
0
is the value of a
*
that has already been obtained
for M
l
0, the zero dynamics of the system can be
expressed as
m
22
a
0
f
1e
l
m
12
_ _

d F
2

d Kd 0 (12)
The stability of the zero dynamics Eq. (12) can be investi-
gated by specifying the sign of the coefcient
(C
za
m
22
a
0
f1e
l
m
12
). Using (11), C
za
can be expressed as
C
za
rAM
l
f
2
1e
a
0
f
1e
l
rA

l
0
xf
1
(x)dx M
l
lf
1e
_

_
_

_
rA 1 a
0
f
1e
l

l
0
xf
1
(x)dx
_

_
_

_
M
l
f
2
1e
a
0
f
1e
l
M
l
lf
1e
_ _
: 13
Using Eq. (13), C
za
can now be written as C
za
C
za0
C
zam
where
C
za0
rA 1 a
0
f
1e
l

l
0
xf
1
(x)dx
_
_
_
_
_
_
1360 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
represents C
za
when M
l
0, and
C
zam
M
l
f
2
1e
a
0
f
1e
l
M
l
lf
1e
(14)
represents the terms that depend on the payload.
Since a
0
is the value of a
*
obtained for M
l
0 that results
in stability of the zero dynamics, therefore we have C
za0
0.
Now, Consider C
zam
in Eq. (14) which can be expressed as
C
zam
M
l
f
2
1e
a
0
M
l
f
2
1e
M
l
f
2
1e
(1 a
0
) (15)
It can be concluded from Eq. (15) that C
zam
0, since
la
0
l 1 and M
l
is always positive. Consequently, C
za
0
and stability of the zero dynamics for the output y
a
is always
ensured for all M
l
0.
For a multi-link manipulator with a higher number of
modes, the above analysis becomes more complicated.
Instead, one can use numerical techniques to study the
above problem. For instance, we analyzed a two-link planar
manipulator whose parameters are given in Section 5. Three
exible modes were considered in this study. First, the
payload mass M
l
was included in the model. Using the
RouthHurwitz criterion, the conditions under which
the system is non-minimum phase were obtained by using
MAPLE (Redfern, 1993). It was found that the system is
non-minimum phase only for a negative payload mass, M
l
.
Consequently, the value of a
*
obtained for M
l
0 ensures
the stability of the zero dynamics when one includes M
l
0
in the system. Hence, an output re-denition scheme can be
used without any a priori knowledge about the payload mass
M
l
. This enables us to design controllers that remain robust
to payload variations. This is the subject of the next section.
3.2. Uncertainty in structural damping
There has been some effort in the literature to model the
structural damping matrix F
2
introduced in Eq. (1). For
instance, Moallem et al. (1997a) used the Rayleigh
Damping method (Thomson, 1988). De Luca & Siciliano
(1989) used f
i
ak
i
, a 0, where f
i
and k
i
are the diagonal
elements of matrices F
2
and K, respectively.
However, an exact model for F
2
is rarely known due to
the uncertainty that is always present in the system.
Therefore, some effort is required in selecting a value of a
for re-denition of the output. By considering F
2
as an
uncertain matrix, the variations of the roots of the character-
istic polynomial of the matrix A(a) can be investigated for
changes in F
2
. Several methods, mostly inspired by
Kharitonovs result (Kharitonov, 1978), have been
developed in the literature for investigating the behavior
of the roots of a polynomial with respect to parametric
variations. One of the results most relevant to our work
deals with coefcients which can either be independently
perturbed or be perturbed multilinearly (Barmish, 1990;
Bartlett et al., 1987). However, these results cannot be
applied in a straightforward manner to our problem, since
the coefcients of the characteristic polynomial of A(a) do
not vary independently. In fact, the parameter a appears
nonlinearly in the polynomial coefcients.
To address the above problems, let us analyze the single
exible-link arm whose parameters are given in Section 6
(Table 1). Two exible modes are considered for this study.
First, using the parameters given by Geniele (1994) namely,
f
1
0.4 and f
2
4, the value of a
*
is found to be 0.75 so that
the matrix A(a) is Hurwitz. However, using numerical
simulations we have found that this choice of a
*
results in
A(a) that is not robust to variations in f
1
and f
2
(i.e. for f
1

0.4 and f
2
3.06). Therefore, by considering f
1
and f
2
as
unknown parameters, the characteristic polynomial of A(a)
is computed using MAPLE (Redfern, 1993). The Routh
Hurwitz table is then constructed to determine the condi-
tions under which the characteristic polynomial remains
Hurwitz. We have found that a value of a
*
0.6 yields a
matrix A(a) that is robust to variations of parameters f
1
and
f
2
ranging from 1 10
8
to 10.
4. Control structure
In this section, four neural network based control struc-
tures are proposed for tip position tracking control of a
single-link exible manipulator using the re-dened output
approach discussed in the previous section. In the rst two
schemes, hidden and output layer weights are adjusted using
the feedback-error-learning strategy (described in the next
section) using a priori knowledge about the linear model of
the manipulator, whereas the third and fourth neural net-
work structures are trained using the steepest descent and
backpropagation algorithms since no a priori knowledge
about the manipulator is assumed.
4.1. Feedback-error-learning strategy
Feedback-error-learning was rst introduced by
Miyamoto et al. (1988) and was later modied by Gomi
& Kawato (1993). Two adaptive control schemes were
proposed using feedback-error-learning to develop neural
network controllers. In both schemes, the signal used for
training the neural network is taken as the output of the
feedback controller. The advantage of this learning scheme
Table 1
Link parameters for the experimental manipulator
l 1.2 m
g 1.2 kg/m
I
h
0.3 kg m
2
b 0.59 Nm/rad s
1
C
coul
4.74 N m for

v 0 and 4.77 Nm for

v 0
EI 1.94 N m
2
q
1
3 rad/s
q
2
19 rad/s
c
1
0.4
c
2
4.0
M
l
30 g
1361 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
is that the target signal or the desired output signal for the
neural network is not required. Also, back-propagation of
the error (training) signal through the controlled system or
through the model of the controlled system is not required.
To emphasize the importance of using the feedback torque
as the error signal the learning scheme is referred to as
feedback-error-learning.
The above idea has been applied to control rigid-link
robot manipulators (Miyamoto et al., 1988; Gomi &
Kawato, 1993) where the system has no zero dynamics
and all the states of the system are assumed to be available
for measurement. For a exible-link manipulator, however,
the zero dynamics related to the tip position are unstable and
full state feedback is in general not readily available. As
discussed in Section 3, a new output may be dened so
that the corresponding zero dynamics are stable. This output
may be dened as the joint variable, but that will not in
general yield an acceptable tip response for a relatively
exible arm. In this section, it will be shown that by using
the new output introduced in the previous section, the con-
cept of feedback-error-learning may be invoked to control
the tip position of a exible-link manipulator.
4.1.1. The rst neural network based strategy: inverse
dynamics model learning (IDML)
The structure of the rst scheme is referred to as inverse
dynamics model learning (IDML) and is shown in Fig. 2.
The manipulator dynamics are assumed to be governed as in
Eq. (6) by
f (v,

v, d,

d, y
a
) u (16)
In this scheme, a conventional feedback controller (CFC)
and a neural network feedback controller are employed as
depicted in Fig. 2. The CFC is used to guarantee asymptotic
stability of the system during the learning process as well as
a reference mode for the response of the controlled system.
For example, a linear controller is expressed as
u
c
K
2
( y
r
y
a
) K
1
( y
r
y
a
) K
0
(y
r
y
a
) (17)
where y
r
, y
r
and y
r
, denote the desired trajectories (i.e.
position, velocity and acceleration, respectively). The goal
of the neural network feedback controller is to ultimately
represent the inverse dynamics model of the controlled
system. The output of the CFC, i.e. u
c
is fed to the neural
network as an error (training) signal. The neural network
also receives v,

v, d,

d, and y
a
as ordinary inputs. The output
of the neural network is represented by
u
n
F(v,

v, d,

d, y
a
, w)
where w is the weight matrix of the network. The learning
rule specied for the feedback-error-learning scheme is
given by wh
F
w
u
c
, where h is the learning rate. Note
Fig. 2. Structure of inverse dynamic model learning (IDML).
Fig. 3. Structure of nonlinear regulator learning (NRL).
1362 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
that feedback-error-learning species the error to the
network u
c
, and
F
w
is computed using the standard back-
propagation algorithm. Once the neural network is trained,
it should acquire an arbitrarily close model of the inverse
dynamics of the controlled system, so that the response of
the controlled system is now governed by
K
2
e K
1
e K
0
e 0 (18)
where e: y
r
y
a
. That is to say, the output tracking error e
converges to zero in accordance with the above reference
model.
4.1.2. The second neural network based strategy: nonlinear
regulator learning (NRL)
The conguration of the second learning scheme that is
referred to as nonlinear regulator learning (NRL) is shown
in Fig. 3. Consider the dynamics of the exible-link system
given by Eq. (5). Multiplying Eq. (5) by B
1
(v, d), we get
R(v, d) y
a
N(v,

v, d,

d) u (19)
where R(v, d) B
1
(v, d) and N(v,

v, d,

d) B
1
(v, d) A
(v,

v, d,

d). As with the IDML method, the CFC in Fig. 3
serves the same two purposes. In comparison with the IDML,
case, the actual acceleration is not used as an input to the
neural network in the NRL scheme. Instead, the reference
trajectories (i.e. position, velocity and acceleration) are fed
to the neural network to generate the feedforward term to
obtain better transient response at the early stage of the
learning when the tracking error and hence u
c
is large.
The output of the neural network may be expressed as
u
n
Q( y
r
, y
r
, y
r
, y
r
y
a
, y
r
y
a
, v,

v, d,

d, w)
F( y
r
, y
r
, y
r
, v,

v, d,

d, w)
for some nonlinear functions Q and F. The weights of the
network are updated according to w h
F
w
u
c
, where the
feedback-error-learning species the error to the network
u
c
, and
F
w
is computed using the standard backpropagation
algorithm. The closed-loop system is obtained by applying u
u
c
u
n
to Eq. (19), to yield
[R(v, d) K
2
]( y
r
y
a
) K
1
( y
r
y
a
) K
0
(y
r
y
a
)
FN(v,

v, d,

d) R(v, d) y
r
0
If F can be made equivalent to F
d
dened as
F
d
N(v,

v, d,

d) R(v, d) y
r
R(v, d)K
1
2
[K
1
( y
r
y
a
)
K
0
(y
r
y
a
)]
then the closed-loop dynamics may be expressed as
[R(v, d) K
2
]( y
r
y
a
) K
1
( y
r
y
a
) K
0
(y
r
y
a
)
R(v, d)K
1
2
[K
1
( y
r
y
a
) K
0
(y
r
y
a
)] 0
This gives
[I R(v, d)K
1
2
][K
2
( y
r
y
a
) K
1
( y
r
y
a
) K
0
(y
r
y
a
)]
0
Consequently, provided that I R (v, d)K
1
2
is nonsingular
within a region in the (v, d) space, then the tracking error
dynamics become
K
2
e K
1
e K
0
e 0 (20)
By nding a proper a that ensures stability of the zero
dynamics corresponding to the new output and by using
full state feedback, the above two feedback-error-learning
schemes may be utilized. However, for a single exible-link
manipulator we require only measurements from the tip and
the joint variables, therefore the output y
a
v a
W(l, t)
l
and
its rst derivative should be constructed from available
measurements. Note that the tip position is given by
y
t
v
W(l, t)
l
, therefore W(l, t) and W

(l, t) may be obtained


directly from y
t
, v, y
t
and

v. Furthermore, it may also be
shown that the nonlinear terms in the mass matrix M(d) and
in the Coriolis and the centrifugal forces h
1
(

v, d,

d) and h
2
(

v, d) may be expressed as functions of



v, W and W

. To see
this, consider the terms in the dynamic equations of a single
exible-link manipulator (De Luca & Siciliano, 1989)
M(d)
M
11(1xl)
(d) M
12(1xn)
M
21(nx1)
M
22(nxn)
_ _
, M
11
(d) m
0
M
l
(f
T
e
d)
2
f
1
0, f
2
0, h
1
(

v, d,

d) 2M
l

v(f
T
e
d)(f
T

d)
h
2
(

v, d) M
l

v
2
(f
e
f
T
e
)d
Now, using the denition of W(l, t), M
11
and h(

v, d,

d) may
be expressed as
M
11
m
0
M
l
W(l, t)
2
, h(v,

v, d,

d)
2M
l

vW(l, t)

W(l, t)
M
l

v
2
fW(l, t)
_ _
Hence, except for the term Kd F
2
d

in Eq. (1) which is


linear in d and

d, all the nonlinearities in the model of the
manipulator can be expressed as functions of

v, W and W

. In
other words, approximate inverse dynamics representation
can be achieved by providing

v, W, and W

to the proposed
two neural networks.
4.2. Deection control for MML and NRL schemes
One of the main limitations of the output re-denition
strategy is that the solution for a
*
may be 1. In other
words, the point (output) under control may be too far away
from the tip and too close to the hub. In this case, controlling
the new output does not necessarily guarantee satisfactory
response for the tip position due to the fact that there is no
direct way to elfectively damp out the elastic vibrations of
the exible modes at the tip. Madhavan & Singh (1991)
employed a linear stabilizer that uses full state feedback
and furthermore assumes linearity of the system dynamics
close to the terminal phase of the desired trajectory. The
approach we propose attempts to overcome the above dif-
culty. Towards this end, let us explicitly include the tip
deection in the objective function of the neural network.
1363 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
In other words, the error to the network is now modied
from u
c
to U
c
, where U
c
u
c
K
3
W(l, t) and u
c
is
given by Eq. (17). This amounts to modifying the
objective function of the neural network to
J
1
2
[e
T
K
0
e e
T
K
1
e e
T
K
2
e W(l, t)
T
K
3
W(1, t)] from
J
1
2
[e
T
K
0
e e
T
K
1
e e
T
K
2
e]. Consequently, it is now
possible to have direct control over the elastic vibrations
of the exible modes through K
3
. Experimental results
given in Section 6 reveal that good control of the tip position
can indeed be obtained even when a

i
is very small (i.e.
close to and even equal to zero) and even when the link is
very exible (for details see Section 6).
4.3. The third neural network based strategy: joint-based
control (JBC)
In this section, the control structures developed in the
previous sections will be generalized by relaxing the a priori
knowledge about the linear model of the exible manipula-
tor. Since a linear model of the system is not always readily
available, a PD-type control law cannot be designed to
stabilize the system and furthermore, a suitable new output
for feedback cannot be determined explicitly as before to
directly apply the feedback-error-learning scheme. To over-
come these difculties, we adopt the general structure of an
adaptive neural network. The proposed control structure is
shown in Fig. 4.
The key to designing this controller is to dene the joint
position as the output for control. This ensures the minimum
phase property of the input-output map due to the colocated
nature of the actuator and sensor. To damp out the tip elastic
deformation the term W(l, t) is explicitly included in the cost
function of the neural network. Consequently, the objective
function for training the neural network is considered as
J
1
2
[e
T
K
1
e e
T
K
2
e W(l, t)
T
K
3
W(l, t)], where e: y
r
0. The objective function J is a weighted function of e,
e and W(l, t) with the corresponding weights given by K
1
, K
2
and K
3
, respectively. Note that, in principle, higher-order
derivatives of the error can also be included in the objective
function. The inputs to the network are v,

v and W(l, t), and
the output of the network is the control signal u. The weight
adjustment mechanism is based on the steepest descent
method, namely
w h
J
w
_ _
T
where w is the vector of the weights of the network and h is
the learning rate. Note that
J
w
is computed according to
J
w

J
e
e
w

J
e
e
w

J
W
W
w
e
T
K
1
e
w
e
T
K
2
e
w
W
T
K
3
W
w
:
Since e y
r
v, we get
J
w
e
T
K
1
v
w
e
T
K
2

v
w
W
T
K
3
W
w
and by using
v
w

v
u
u
w
,

v
w

v
u
u
w
,
W
w

W
u
u
w
and
u F(y
a
, y
a
, W, w), we may write
J
w
as
J
w
e
T
K
1
v
u
e
T
K
2

v
u
W
T
K
3
W
u
_ _
F
w
where
F
w
can be computed using the backpropagation
method and
v
u
,

v
u
,
W
u
are computed as suggested in Light-
body et al. (1990) by using the sign of the gradient instead of
its real value for training the neural controller.
4.4. The fourth neural network based strategy: output
re-denition through online learning (ORTOL)
In this section, the assumption of a priori knowledge
about the linear model of the system is relaxed through
online learning. The proposed control structure is shown
in Fig. 5. In this structure, two neural networks are
employed. The rst neural network (NN1) is trained to func-
tion as a feedback controller and the second neural network
(NN2) is trained to provide a proper output for feedback. In
other words, NN1 is trained to produce a control action so
that the error between the output dened by NN2 and the
desired reference trajectory is minimized. The objective
function that is used for training NN1 is chosen as
J
1

1
2
[e
T
K
1
e e
T
K
2
e W(l, t)
T
K
3
W(l, t)], where e: y
r
y
a
, and y
a
is constructed by rst measuring v and then
adding the output of NN2 to it. Note that W(l, t) is
Fig. 4. Joint-based controller (JBC).
1364 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
introduced in J
1
to reduce the vibrations of the exible
modes of the system in the output. The inputs to NN1 net-
work are e, e and W(l, t), and the output of the network is the
control signal u. The weight adjustment mechanism is based
on the steepest descent method, namely
w h
1
J
1
w
_ _
T
where
J
1
w

J
1
e
e
w

J
1
e
e
w

J
1
W
W
w
e
T
K
1
e
w
e
T
K
2
e
w
W
T
K
3
W
w
:
Using e y
r
y
a
, we get
J
1
w
e
T
K
1
y
a
w
e
T
K
2
y
a
w
W
T
K
3
W
w
and using
y
a
w

y
a
u
u
w
,
y
a
w

y
a
u
u
w
,
W
w

W
u
u
w
and u F(y
a
,
y
a
, W, w), we may write as
J
1
w
as
J
1
w
e
T
K
1
y
a
u
e
T
K
2
y
a
u
W
T
K
3
W
u
_ _
F
w
where
F
w
can be computed using the backpropagation
method and
y
a
u
,
y
a
u
,
W
u
are computed as suggested in Light-
body et al. (1990) by using the sign of the gradient instead of
its real value for training the neural controller. An approxi-
mation to the gradient was also suggested by Psaltis et al.
(1988) as
y
u

y(u du) y(u)


du
: (21)
This approximate derivative can be determined by changing
each input to the plant slightly at the operating point and
measuring the changes.
The objective of the NN2 network is to generate an output
of the form y
ai
v
i
a
i
W
i
(l
i
, t)
l
i
. Since v
i
can be measured and
W
i
(l
i
, t) can be computed from y
ti
and v
i
, the neural network
is trained to obtain an appropriate estimate for a
i
. Using a
network whose weights are limited to the interval [1, 1],
the objective function to minimize is selected as
J
2

1
2
(e
T
e). This leads to the following adjustment law
a
i
h
2
J
2
a
i
h
2
e
i
e
i
a
i
h
2
e
i
y
ai
a
i
h
2
e
i
W
i
(l
i
, t)
The input to the network is W(l, t) and the output vector
elements are computed as a
i
W
i
(l
i
, t). The new output that is
used for feedback is now constructed as y
ai
v
i
a
i
W
i
(l
i
, t)
l
i
5. Simulation results for a two-link planar manipulator
In this section, simulation results for the proposed neural
network controllers are presented. A planar two-link
manipulator is considered in the simulations. The following
results are provided to demonstrate the applicability and
potential of the proposed neural network algorithms for a
system that is highly nonlinear with strong coupling effects
between the joints. In the next section, we show actual
experimental results obtained using a single-link manipu-
lator test-bed available in our lab. The two-link manipulator
consists of one rigid arm (rst link) and one exible arm
(second link) with the following numerical data
l
1
20 cm, l
2
60 cm, A
1
5 0.9 mm, A
2
3.14
1.3 cm,
r
1
2700 kg/m
3
(6061 Aluminum), r
2
7981 kg/m
3
(stainless steel),
M
1
1 kg, M
l
0.251 kg, m
1
0.236 kg, m
2

0.216 kg,
E 194 109 N/m
2
, J
1
0.11 10
3
kg m
2
, J
l

0.11 10
4
, J
h
3.8 10
5
where l
1
and l
2
are link lengths, A
1
and A
2
are cross-
sectional areas, E and r are modulus of elasticity and
mass density, J
h
is the hub inertia and M
1
, M
l
, J
1
and J
l
are masses and mass moments of inertia at the end points of
the two links. The rst two natural frequencies of the second
link are 5.6 and 27.6 Hz.
First, based on the procedure given in Section 3 the value
of a
*
is found to be a
*
[1 0.6]
T
. Consequently, a value of
a
2
0.5 is used in the simulations. Figure 6(aj) shows the
system responses to a sin(t) reference trajectory for both
Fig. 5. Structure of the controller using output re-denition through online learning (ORTOL). The block EC performs a linear combination of e, e and W(l,t)
for the specic learning schemes used in NN1 and NN2.
1365 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
links. These responses are obtained by using different con-
trol strategies. As can be seen from Fig. 6(a,b), controlling
the system using PD control (a
2
0.5) yields considerable
amount of tracking errors in the responses of v
1
, y
t2
. For
comparison, the responses of the system to the same refer-
ence trajectory obtained by using the proposed four neural
network controllers are shown in Fig. 6(c,d) (IDML, scheme
a
2
0.5), Fig. 6(e,f) (NRL scheme a
2
0.5), Figure 6(g,h)
( joint-based control scheme), and Fig. 6(i,j) (ORTOL
scheme).
For implementing the IDML scheme, a three-layer neural
network was used with eight neurons in the input layer, 10
neurons in the hidden layer, and two neurons in the output
layer. The inputs to the network are v,

v, W, W

, y
a
. The
conventional controller is given by Eq. (17). The NRL,
scheme was implemented by a three-layer neural network
with 16 neurons in the input layer, 10 neurons in the hidden
layer, and two neurons in the output layer. The inputs to the
network are v,

v, W, W

, e, e, y
r
, y
r
, and y
r
, where e y
r
y
a
.
The conventional controller is also given by Eq. (17). In the
joint-based control scheme, a three-layer neural network
with 20 hidden neurons ve input neurons and two output
neurons was used. The inputs to the network are e, e, W(l, t),
where e y
r
v. For implementing the ORTOL scheme, a
three-layer neural network was employed for NN1 with ve
input neurons, 20 hidden neurons and two output neurons.
For all of the networks, the hidden layer neurons have sig-
moidal transfer functions and the output neurons use linear
Fig. 6. Actual tip responses to sin(t) reference trajectory for both links: (a,b) PD control, (c,d) IDML, scheme, (e,f) NRL scheme., (g,h) JBC scheme, and (i,j)
ORTOL scheme (dashed lines correspond to the reference trajectories).
1366 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
activation functions. Note that all network weights are
initialized to very small random values.
The above gures do indeed demonstrate signicant
improvement in the responses of the system (v
1
, y
t2
) when
one utilizes the proposed neural network controllers. As
Fig. 6(cf) shows, the IDML and the NRL schemes yield
similar results for this model since both schemes use the
same learning rule, that is, feedback-error-learning. As com-
pared to the results obtained by using the joint-based control
and the ORTOL schemes [see Fig. 6(gj)], more accurate
results can be obtained by using the IDML and the NRL,
schemes. This is due to the fact that the IDML and the NRL,
schemes use some a priori knowledge about the system
dynamics (a linear model of the system).
Having no a priori knowledge about the system dynamics
also leads, in general, to an increase in the sizes of the neural
networks. For instance, the IDML and the NRL schemes are
able to obtain good tracking performance with 10 hidden
neurons, while good results for the joint-based control and
the ORTOL schemes are obtained by using 20 hidden
neurons.
6. Experimental results
This section presents the experimental results obtained
using a single exible-link manipulator test-bed constructed
in our Robotics and Control Laboratory (Geniele, 1994).
Friction and stiction at the hub, high exibility, and payload
variations make the control of this experimental set-up a
challenging and a nontrivial control problem.
6.1. The test-bed
The experimental system consists of a highly exible link
whose parameters are shown in Table 1. In this table, l is the
length of the link, g is the mass per unit length, I
h
is the hub
inertia, b is the viscous friction at the hub, E is Youngs
modulus, I is the beam area moment of inertia, q
j
is the
jth resonance frequency of the beam, and M
l
, is the
payload mass. A schematic of the experimental test-bed
is shown in Fig. 7 and is described in detail in (Geniele,
1994).
The beam consists of a central stainless steel tube with
annular surface corrugations. Aluminum blocks are bolted
to the tube and two thin parallel spring steel strips slide
within slots cut into the blocks. The high performance
drive consists of a pulse width modulated amplier that
operates in current feedback mode, a DC servo motor with
an optical encoder and a harmonic drive speed reducer. An
infrared emitting diode is used to sense the position of the
tip. The detector consists of a UDT camera consisting of a
lens and an infrared-sensitive planar diode, and is mounted
at the links hub. The digital controller consists of a Spec-
trum C30 system card, based on the Texas Instruments
TMS320C30 digital signal processing chip. Two channels
of 16 bit A/D and D/A are also provided. An interface
system connects the Spectrum card to the current amplier,
infrared emitting diode, optical encoder and infrared detec-
tor. The maximum torque range generated by the motor is
0.705 N m. The speed reducer amplies the motor torque
by a factor of 50 and yields an output torque with a range of
35.25 N m. The maximum tip deection of 0.25 m can
Fig. 7. Block diagram of the experimental test bed.
1367 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
be measured using the infrared emitting diode, the UDT
camera and the UDT amplier.
6.2. Selection of the output
In earlier experiments on this test-bed (Geniele, 1994;
Geniele et al., 1997) the authors designed a linear controller
based on transmission zero assignment to control the
exible-link system. The required parameters for the linear
model of the manipulator have been taken from (Geniele,
1994). Based on the reported values of the structural damping
parameters, i.e. f
1
0.4 and f
2
4, the value of a
*
is found to
be a
*
0.75. However, experimental results show that con-
trolling the system using this output leads to instability. The
source of this problem is the uncertainty in the coefcients f
1
and f
2
. As we discussed previously (see Section 3), a value of
a
*
0.6 is robust to variations in the coefcients f
1
and f
2
.
However, in controlling the system using this output the
exible modes of the system actually vibrate with such
high amplitudes that in some situations, the tip deections
exceed the feasible range of the sensor measurements. Con-
sequently, a value of a 0.48 is used to ensure that the
deections in the arm are within the range of the sensor.
6.3. Estimation of the higher-order derivatives of the output
As explained previously, with the available sensors we
can only measure the joint position and the tip deection
while our proposed control schemes require the rst and the
second order derivatives of the output as well. Towards this
Fig. 8. Experimental system responses (actual tip position to y
t
and tip deection W) to a 0.2 rad step input using a PD controller (a 0.48): (a,b)K
p
100,
K
v
100, M
l
30 g, (c,d)K
p
200, K
v
100, M
l
30 g, and (e,f )K
p
200, K
v
100, M
l
850 g.
1368 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
end, an observer has been designed to estimate the higher-
order derivatives of the output y
a
(Tornambe, 1992)
according to

y
1
y
2

l
n 1
e
(y
a
y
1
),

y
n 1
y
n

l
1
e
n 1
(y
a
y
1
),

y
n

l
0
e
n
(y
a
y
1
)
22
where l
i
, i 1,, n 1 are chosen such that the polynomial
H(s) s
n
l
n1
s
n1
l
1
s l
0
is a Hurwitz
polynomial, and e is a sufciently small positive number
1. The states y
1
, y
n
asymptotically estimate y
a
and
its higher-order derivatives up to order n-1. For details
refer to (Tornambe, 1992).
6.4. Discussion of the results
6.4.1. Conventional PD control
In the rst experiment using the new output y
a
(a 0.48),
a PD control with relatively large gains (K
p
100, K
v

100) was implemented. The responses of the actual tip posi-
tion and the tip deection are shown in Fig. 8(a,b). As can be
seen, there is a considerable amount of steady-state error in
tracking the tip position. The error is mainly caused by the
presence of friction and stiction at the hub whose amplitudes
Fig. 9. Experimental system responses (actual tip position to y
t
and tip deection W) to a 0.2 rad step and 0.2 0.1 sin(t) input using the IDML, neural network
controller (a 0.48): (a,b) no modication in the objective function, M
l
30 g, (c,d) with modication in the objective function, M
l
30 g, (e,f) same as c-d,
but with M
l
850 g, and (g,h) same as c-d, but with 0.2 0.1 sin(t) reference trajectory (dashed line correspond to the reference trajectory).
1369 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
vary with the hub position. The only way that PD control
can overcome the effects of the friction and stiction at the
hub is to increase its gains. Increasing the velocity gain K
v
leads to instability of the system. As the PD gain K
p
is
increased, smaller steady-state errors are obtained, but at
the expense of a considerable oscillatory transient behavior
of the tip. These results are shown in Fig. 8(c,d). Note that the
use of high gains may also lead to large deections of the arm
and instability of the closed-loop system caused by saturation
of the ampliers and sensors. For example by using the gains
K
p
200 and K
v
100, the closed-loop system becomes
unstable for a step input that is greater than 0.2 rad.
In another experiment, M
l
was increased to 850 g from
30 g and the responses of the system to a 0.2 rad step input
are shown in Fig. 8(e,f). As can be observed, the responses
exhibit less oscillation, but greater steady-state error as
compared to the case of M
l
30 g.
In the following sections our proposed neural network
controllers are applied to the exible system. Note that the
structures of the neural networks indicated by the number of
inputs, outputs, and hidden neurons used in the following
experiments (single-link manipulator), are different from
those used in simulation section (two-link manipulator).
6.4.2. The IDML scheme
The IDML scheme was implemented based on the new
output y
a
(a 0.48) and using a three-layer neural network
with four neurons in the input layer, ve neurons in the
hidden layer, and one neuron in the output layer. No
off-line training is required and the hidden and output layer
weights are all initialized to zero. The inputs to the network
are

v, W, W

and y
a
. The hidden layer neurons have sigmoidal
transfer functions and the output neuron uses a linear
activation function. The conventional controller is given
by Eq. (17). The responses of the actual tip position and
the tip deection to a 0.2 rad step input for M
l
30 g are
shown in Fig. 9(a,b). The results clearly illustrate the
improvements in the tracking performance when compared
to those shown in Fig. 8(ad).
Next, W(l, t) was taken into consideration for modifying
the learning rule of the neural network as discussed in
Section 4.2 to further improve the transient response of
the system. The responses of the system are now shown in
Fig. 9(c,d). The results clearly show the further improve-
ment of the performance of the closed-loop system.
In another experiment, the robustness of the IDML
scheme was examined by increasing the payload mass M
l
from 30 to 850 g. The responses of the system are shown in
Fig. 9(e,f). As Fig. 9(cf) demonstrate, the neural network
controller is robust to payload variations.
Fig. 9(g,h) shows the responses of the experimental
system to a 0.2 0.1sin(t) reference trajectory. The tip
follows the desired trajectory with a small tracking error.
Note that PD control with gains K
p
200 and K
v
100
leads to an unstable response for this desired trajectory.
6.4.3. The NRL scheme
By using the new output y
a
(a 0.48), the neural network
scheme NRL was employed to control the system. A three-
layer neural network was used with eight neurons in the input
layer, ve neurons in the hidden layer, and one neuron in the
output layer. No off-line training is required and the hidden and
output layer weights are all initialized to zero. The inputs to the
network are

v, W, W

, e, e, y
r
, y
r
and y
r
, where ey
r
y
a
. The
hidden layer neurons have sigmoidal activation functions and
the output neuron uses a linear activation function. The con-
ventional controller used is given by Eq. (17). The responses of
the actual tip position and the tip deection to a 0.2 rad step
input for M
l
30 g are shown in Fig. 10(a) and Fig. 9(b). The
results show an improvement in tracking performance when
compared to those in Fig. 8(ad).
The effect of deection control on the performance of the
NRL method was also investigated and the responses of the
system are shown in Fig. 10(c,d). The robustness of the NRL
scheme to the payload variation was examined by changing
M
l
to 850 g. The responses of the system in this case are
depicted in Fig. 10(e,f). As can be seen, the NRL scheme is
also robust to payload variations.
The responses of the system to a 0.2 0.1sin(t) reference
trajectory are shown in Fig. 10(g,h). As in previous cases,
the tip follows the desired trajectory with a small tracking
error.
6.4.4. The joint-based control scheme
The joint-based control scheme was implemented by
using a three-layer neural network with three neurons in
the input layer, ve neurons in the hidden layer, and one
neuron in the output layer. No off-line training is required and
the hidden and output layer weights are all initialized to zero.
The inputs to the network are e, e and W, where e y
r
v.
The hidden layer neurons have sigmoidal activation functions
and the output neuron uses a linear activation function.
The responses of the experimental system to a 0.2 rad
step input are shown in Fig. 11 when W(l,t) is not included
in the cost function of the network [Fig. 11(a,b)] and when it
is included [Fig. 11(c,d)]. As can be seen, the tip response is
signicantly improved by adding W(l, t) to the cost function
of the neural network.
Next, M
l
was increased to 850 g and the performance of
the joint-based control scheme was evaluated under this
parametric variation. The responses of the system are
shown in Fig. 11(e,f) which demonstrate the robustness of
the joint-based control scheme to payload variations. Fig.
11(g,h) shows the responses of the system to a 0.2
0.1sin(t) reference trajectory. It can be observed that the
tip tracks the desired trajectory with small tracking error.
6.4.5. The ORTOL scheme
The nal set of experiments were performed using the
fourth scheme in which a priori knowledge about the system
1370 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
dynamics has been relaxed through online learning. A three-
layer neural network was employed for NN1 with three
input neurons, ve hidden neurons and one output neuron.
No off-line training is required and the hidden and output
layer weights are all initialized to zero. The activation func-
tion used for the input and hidden layers is the tan-sigmoid
function and for the output layer is a linear function. A
single neuron was employed for NN2 whose weight is
limited to the range [ 1, 1].
Figure 12 shows the system responses to a 0.2 rad step input
for M
l
30 g. The responses of the system for M
l
850 g
are shown in Fig. 12(c,d). Finally, in Fig. 12(e,f), the
responses of the system to a 0.2 0.1sin(t) reference
trajectory are given. From the above gures, we can
conclude that good tracking performance can be obtained
experimentally even when no a priori knowledge about the
system dynamics is assumed.
6.4.5.1. Remark 1. Zero initial weights with small biases
were used for all the above neural network controllers
unlike the simulation results where small random initial
weights were utilized. This is because of the limitations in
the range of the actuator torques and sensor deection
measurements. In other words, random initial weights
could sometimes lead to saturation of the actuator and the
sensor during early stages of learning and cause closed-loop
system instability. This is of particular concern in the JBC
and ORTOL schemes because of the lack of PD control. By
Fig. 10. Experimental system responses (actual tip position to y
t
and tip deection W) to a 0.2 rad step and 0.2 0.1 sin(t) input using the NRL neural network
controller (a 0.48): (a,b) no modication in the objective function, M
l
30 g, (c,d) with modication in the objective function, M
l
30 g, (e,f) same as c-d,
but with M
l
850 g, and (g,h) same as c-d, but with 0.2 0.1 sin(t) reference trajectory (dashed line correspond to the reference trajectory).
1371 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
selecting zero initial weights, even though the learning
process is slowed in the beginning, the probability of
instability is also reduced signicantly. Through extended
online learning, no adverse stability effects have been
observed even after 10 min of continuous experimental
runs (as compared to 10 s of real-time experimental
results shown in Sections 6.4.2 to 6.4.5) for all the
proposed neural networks.
6.4.5.2. Remark 2. Computational load for training the
neural networks introduces delays in applying the control
torque to the system. This delay is proportionally increased
when the size of the network is increased. On the other hand,
the sampling rate cannot be decreased below a certain value
for stability reasons. Consequently, because of the
computational constraints in our testbed setup, the number
of hidden layer neurons used in the JBC and ORTOL
schemes was selected to be identical to that in the IDML
and NRL schemes. In an ideal situation one would, in
general, require more hidden layer neurons for the JBC
and ORTOL schemes as pointed out in Section 5.
6.5. Other properties of the neural network controllers
The following results are aimed to show other character-
istics of neural network controllers such as dynamic range,
robustness to the noise and payload variations. First, two
experiments were performed with two values of a for the
Fig. 11. Experimental system responses (actual tip position to y
t
and tip deection W) to a 0.2 rad step and 0.2 0.1 sin(t) input using the joint-based neural
network controller: (a,b) no modication in the objective function, M
l
30 g, (c,d) with modication in the objective function, M
l
30 g, (e,f) same as c-d, but
with M
l
850 g, and (g,h) same as c-d, but with 0.2 0.1 sin(t) reference trajectory (dashed line correspond to the reference trajectory).
1372 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
output namely, a 0.48 and a 0 using modied learning
rule. The results for the tip position to a 0.6 rad step input for
both cases are shown in Fig. 13 (top gure). The 0.6 rad step
input was chosen to demonstrate that the dynamic range of
the neural network controller is much larger than that of the
PD controller. Specically, the PD controller yields an
unstable system for this reference trajectory. For com-
parison, the tip response for the a 0 case without any
modication to the objective function is also shown (bottom
gure). As can be seen, even for the a 0 case, the tip
response is signicantly better than that of the case with no
modication in the sense that the vibrations of the exible
modes are damped out very quickly.
To show the robustness of the neural network controllers
to disturbances, another experiment was performed and the
results are shown in Fig. 14. First, a 0.3 rad step input was
applied and the NRL neural controller was employed for
controlling the system. After, the system reached,its
steady-state value, disturbances were applied as unexpected
tip deections [see Fig. 14(d)]. As these gures show, the
neural network controller can maintain a stable closed-loop
system even when the magnitude of the disturbance reaches
0.3 m.
Also of interest are the applied torque proles of the
neural network controllers. The applied torques for a
0.2 rad step input (the corresponding manipulator responses
are shown in Fig. 8(c), Fig. 9(c), Fig. 10(c), Fig. 11(c) and
Fig. 12(a)) are shown in Fig. 15(a) for the PD controller,
Fig. 15(b) (u) and Fig. 15(c) (u
n
) for the IDML scheme, Fig.
15(d) (u) and Fig. 15(e) (u
n
) for the NRL scheme, Fig. 15(f)
Fig. 12. Experimental system responses (actual tip position to y
t
and tip deection W) to a 0.2 rad step and 0.2 0.1 sin(t) input using the ORTOL neural
network controller: (a,b) step response with M
l
30 g, (c,d) Step response with M
l
850 g, and (g,h) 0.2 0.1 sin(t) reference trajectory (dashed line
correspond to the reference trajectory).
1373 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
for the JBC scheme, and Fig. 15(g) for the ORTOL scheme.
It can be seen that the magnitudes of the torques for all the
neural network controllers are much smaller than for the PD
controller. This is one reason why the neural network con-
trollers have a larger dynamic range as compared to the PD
controller.
Finally, in our last experiment, we attempt to verify the
claim stated in Section 3 that the region of minimum phase
Fig. 13. Actual tip responses to a 0.6 rad step input using the NRL neural network controller for different outputs: topa 0.48 (solid line) and a 0 (dashed
line) with modied learning rule; bottoma 0, with no modications.
Fig. 14. System responses to a 0.3 rad step input using the NRL neural network controller: (a) actual tip position, (b) redened output (c) hub position, and (d)
tip deection.
1374 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
behavior increases as payload mass increases. As mentioned
in Section 6.2, with M
l
30 g it is not possible to control the
system by using the new output y
a
0.6. Figure 16 shows
the responses of the system to a 0.2 0.1sin(t) reference
trajectory for M
l
850 g obtained by using the NRL
scheme (a 0.65). It can be observed that a stable
closed-loop system is indeed obtained with a small tip
position tracking error. This conrms our statement that
the value of a
*
increases as the payload mass M
l
increases.
6.6. Summary
As the experimental results in previous sections demon-
strate, the performance of all four proposed neural network
schemes is superior to that of their conventional PD
controller counterparts. This is partly due to the fact that
to overcome the effects of friction when using PD
controllers, the PD gains have to be set very high, which
considerably affects the transient response, dynamic range,
and robustness of the closed-loop system.
Among the neural network controllers however, the
IDML and the NRL schemes yield similar results (Figs. 9
and 10) since both schemes assume a priori knowledge
about the linear model of the system and use the same
learning rule, that is feedback-error-learning. However,
the responses obtained by using the NRL scheme are
more accurate and smoother than those obtained by the
IDML scheme. The reason is that in the NRL scheme, the
reference trajectories (i.e. position, velocity, and accelera-
tion) are fed directly to the neural network. Consequently,
Fig. 15. Applied torques for a 0.2 rad step input: (a) PD controller, (b,c) the IDML scheme, (d,e) the NRL scheme, (f) the JBC scheme, and (g) the ORTOL
scheme.
1375 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
by generating the feedforward term explicitly results in a
smoother transient response.
When a priori knowledge about the system linear
dynamics is assumed, in general, a smoother transient
response is obtained. For instance, the step responses of
the system with M
l
850 g obtained by using the
ORTOL scheme (Fig. 12) exhibit more oscillations than
those obtained by using the IDML and the NRL schemes
(Figs. 9 and 10, respectively). Furthermore, looking at
the responses of the system to sinusoidal reference trajec-
tories, reveals that the second exible mode of the system
(19 rad/s) vibrates with higher magnitudes for the joint-
based controller (Fig. 11) and the ORTOL scheme
(Fig. 12) as compared to the IDML and the NRL schemes
(Figs. 9 and 10).
Table 2 compares the performance of the conventional
PD controller to those of the neural network based con-
trollers. In this table, M
l
is the payload mass, DC represents
deection control, T
s
is the settling time, ESS is the steady-
state error, PO is the percentage overshoot, and PU is the
percentage undershoot. The results are obtained for a 0.1 rad
step input. The results given in this table lead to the con-
clusion that in general the neural network controllers are
more robust to payload variations than the PD controller.
Fig. 16. System responses to a 0.2 0.1 sin(t) input using the NRL neural neural network controller for M
l
850 g (a 0.65): (a) actual tip position, (b)
redened output (c) control torque, and (d) tip deection (dashed lines correspond to the desired trajectories).
Table 2
Summary of the results
Scheme M
l
(g) T
s
(S) ESS (%) PO(%) PU (%)
PD 30 K
p
100, K
v
100 33
30 K
p
200, K
v
100 6.69 0 34.5 48.2
850 K
p
200, K
v
100 3.93 10 15 7.5
IDML 30 Without DC 5.56 0 8.2 12
30 With DC 2.61 0 3.7
850 With DC 2.96 0 2.0
NRL 30 Without DC 3.92 0 6.3 2.5
30 With DC 3.00 0 2.1
850 With DC 2.95 0 1.9
JBC 30 Without DC 5.46 0 12.6 36.2
30 With DC 3.48 0 3.6
850 With DC 3.28 0 3.1
ORTOL 30 With DC 4.55 0 2.3
850 With DC 3.45 0 13 7.1
1376 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377
The other conclusion that can be drawn is that deection
control signicantly improves the performance of the
system in the sense that in general it reduces the overshoot,
undershoot, and the settling time.
7. Conclusions
This paper has addressed the tip position control of a
exible-link manipulator using four different neural
network based schemes. Only partial knowledge about the
system dynamics is required in developing the rst two
neural network controllers and the sign of the output signals
in developing the third and the fourth neural network con-
trollers. An output re-denition method which requires no a
priori knowledge about payload mass was used to overcome
the problems caused by the non-minimum phase character-
istic of the exible-link manipulator. The proposed neural
network based controllers were developed and their
performance was evaluated by simulation and experiments.
No off-line training is needed for designing the proposed
controllers. Experimental results revealed the superiority of
the proposed neural network controllers over a model-based
PD controller in the presence of unmodeled dynamics and
nonlinearities such as hub friction and stiction and payload
variations.
Acknowledgements
This research was supported in part by Fonds pour la
Formation de Chercheurs et lAide a` la Recherche
(FCAR) of the Province of Quebec under Grant ER-1042.
References
Barmish B.R. (1990). A generalization of Kharitonovs four-polynomial
concept for robust stability problems with linearly dependent coef-
cient perturbations. IEEE Transactions on Automatic Control, 34,
157164.
Bartlett A.C., Hollot C.V., & Lin H. (1987). Root locations of an entire
polytope of polynomials: it sufces to check the edges. Mathematics of
Control, Signals and Systems, 1, 6171.
Bayo, E. & Moulin H. (1989). An efcient computation of the inverse
dynamics of exible manipulators in the time domain. Proceedings of
the IEEE International Conference on Robotics and Automation
(pp. 710715).
Book, W.J. (1984). Recursive Lagrangian dynamics of exible manipulator
arms. International Journal of Robotics Research, 3 (4), 87101.
Cheng, W. & Wen, J.T. (1993). A neural controller for the tracking control
of exible arms. Proceedings of the IEW International Conference on
Neural Networks (pp. 749754).
De Luca A., & Siciliano B. (1989). Trajectory control of a nonlinear one-
link exible arm. International Journal of Control, 50, 16991715.
Donne, J.D. & Ozguner, U. (1994). Neural control of a exible-link
manipulator. Proceedings of the IEEE International Conference on
Neural Networks (pp. 23272332).
Geniele, H. (1994). Control of a exible-link manipulator. Masters thesis,
Concordia University, Montreal, Canada.
Geniele, H., Patel, R.V. & Khorasani, K. (1992). Control of a exible-link
manipulator. Proceedings of the Fourth ASME International
Symposium on Robotics and Manufacturing (pp. 567572).
Geniele H., Patel R.V., & Khorasani K. (1997). End-point control of a
exible-link manipulator: theory and experiments. IEEE Transactions
on Control Systems Technology, 5 (6), 556570.
Gomi H., & Kawato M. (1993). Neural network control for a closed-
loop system using feedback-error-learning. Neural Networks, 6, 933
946.
Hashtrudi-Zaad K., & Khorasani K. (1996). Control of nonminimum phase
singularly perturbed systems with applications to exible link
manipulators. International Journal of Control, 63, 679701.
Isidori, A. & Moog, C. (1987). On the nonlinear equivalent of the notion of
transmission zeros. In C.I. Byrnes & K.H. Kurszanski (Ed.), Modeling
and Adaptive Control. Berlin: Springer.
Kharitonov V.L. (1978). Asymptotic stability of an equilibrium position of
a family of systems of linear differential equations. Differentialnye
Uraveniya, 14, 14831485.
Kwon, O.S. & Book, W.J. (1990). An inverse dynamic method yielding
exible manipulator state trajectories. Proceedings of the American
Control Conference (pp. 186193).
Lightbody, G., Wu, Q.H. & Irwin, G.W (1990). Control application for
feedforward networks. In T.W Miller et al. (Eds.), Neural Networks
for Control (pp. 5171). Cambridge, MA: MIT Press.
Madhavan S.K., & Singh S.N. (1991). Inverse trajectory control and zero
dynamic sensitivity of an elastic manipulator. International Journal of
Robotics and Automation, 6 (4), 179191.
Miyamoto H., Kawato M., Setoyarna T., & Suzuki R. (1988). Feedback-
error-learning neural network for trajectory control of a robotic
manipulator. Neural Networks, 1, 251265.
Moallem M., Patel R.V., & Khorasani K. (1997a). An inverse dynamics
control strategy for tip-position tracking of exible multi-link
manipulators. Journal of Robotic Systems, 14, 649658.
Moallem M., Khorasani K., & Patel R.V. (1997b). An integral
manifold approach to tip position tracking of exible multi-link
manipulators. IEEE Transactions on Robotics and Automation, 13 (6),
823837.
Newton R.T., & Xu Y. (1993). Neural network control of a space
manipulator. IEEE Control Systems Magazine, 12, 1422.
Psaltis D., Sideris A., & Yarnamura A.A. (1988). A multilayered neural
network controller. IEEE Control Systems Magazine, 4, 1721.
Redfern, D. (1993). The MAPLE Handbook. New York: Springer.
Schoenwald, D. A. & Ozguner, U. (1990). On combining slewing and
vibration control in exible manipulator via singular perturbations.
Proceedings of the 29th IEEE Conference on Decision and Control
(pp. 533538).
Siciliano B., & Book W.J. (1988). A singular perturbation approach to
control of lightweight exible manipulators. International Journal of
Robotics Research, 7 (4), 7990.
Slotine, J.E. & Li, W. (1991). Applied Nonlinear Control. Englewood
Cliffs, NJ: Prentice-Hall.
Talebi, H.A., Khorasani, K. & Patel, R.V. (1997). Experimental evaluation
of neural networkbased controllers for tip position tracking of a exible-
link manipulator. Proceedings of the IEEE International Conference on
Robotics and Automation (pp. 33003305).
Thomson, W.T. (1988). Theory of Vibrations with Applications. Englewood
Cliffs, N.J.: Prentice-Hall.
Tornambe A. (1992). Output feedback stabilization of a class of non-
minimum phase nonlinear systems. Systems and Control Letters, 19,
193204.
Wang, D. & Vidyasagar, M. (1989). Transfer functions for a single exible
link. Proceedings of the IEEE International Conference on Robotics
and Automation (pp. 10421047).
1377 H.A. Talebi et al. / Neural Networks 11 (1998) 13571377

You might also like