Professional Documents
Culture Documents
On the inverse
shortest path problem
Didier Burton
In spite of the fact that their motivation and development originally occured at dierent times,
graph theory and optimization are elds of mathematics which nowadays have many connections.
Early on, the use of graphs suggested intuitive approaches to both pure and applied problems.
Optimization and more precisely mathematical programming have steadily grown with the size
and diversity of the problems considered. Available computer hardware and hence computer
science certainly contributed to many of these developments. Optimization techniques therefore
often supplied a suitable algorithmic framework for solving problems arising from graph theory.
This doctoral thesis is about such a connection between graph theory and optimization.
My purpose in this work is to analyse the inverse shortest path problem. I was introduced to
this problem during a two-year research period (supported by the Region Wallonne), whose aim
was to model the behaviour of road networks users, particularly in urban centres. Trac mod-
elling revealed the importance of accurate estimates of perceived travel costs in a road network.
This experience motivated the present research.
I am very grateful to my advisor, Professor Philippe Toint. His invaluable guidance, avail-
ability and judicious advice were much appreciated. In addition, I enjoyed the opportunities he
gave me to interact with other professors and researchers abroad.
I especially want to thank Bill Pulleyblank (IBM T.J. Watson Research Center, Yorktown
Heights, USA) for his collaboration, during my visit to Yorktown Heights. I am also grateful
to Laurence Wolsey (CORE, Louvain-la-Neuve, Belgium), Michel Minoux (Universite Pierre et
Marie Curie, Paris, France), Tijmen Jan Moser (Rijksuniversiteit, Utrecht, The Netherlands) and
Annick Sartenaer, Michel Bierlaire and Daniel Goeleven from the Department of Mathematics
(FUNDP, Namur) for very interesting discussions and suggestions for this work.
I wish to express my thanks to the members of my advisory board who kindly agreed to
examine this work: F. Callier, J.-J. Strodiot (both from FUNDP, Namur), L. Wolsey (CORE,
Louvain-la-Neuve) and M. Minoux (Universite Pierre et Marie Curie, Paris, France). I am also
indebted to S. Vavasis (Cornell University, USA) and anonymous referees who contributed to
improving parts of my thesis.
Michel Vause (GRT, FUNDP, Namur) has supplied useful tools and hints for writing and
illustrating this text.
The Department of Mathematics of the Facultes Universitaires Notre-Dame de la Paix (Na-
mur) hosted me during my thesis work, and partly supported participation in scientic meetings
in London (UK) and Chicago (USA). The Transportation Research Group (FUNDP, Namur)
i
preface ii
provided the computer hardware used for the numerical experiments, and contributed to the
expenses of several trips abroad. The Communaute Francaise de Belgique also gave nancial
support for my mission to London (UK).
Finally and most importantly, the Belgian National Fund for Scientic Research supported
me during the preparation of this thesis.
Preface i
1 Introduction 1
1.1 The graph theory context : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1
1.2 Motivating examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2
1.2.1 Trac modelling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2
1.2.2 Seismic tomography : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3
1.3 The inverse shortest path problem : : : : : : : : : : : : : : : : : : : : : : : : : : : 3
1.4 Solving the problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5
1.4.1 A shortest path method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5
1.4.2 An optimization framework : : : : : : : : : : : : : : : : : : : : : : : : : : : 5
1.4.3 Solving an instance of inverse shortest path problems : : : : : : : : : : : : 6
2 The shortest path problem 7
2.1 Terminology and notations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
2.2 A specic shortest path problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
2.2.1 The problem type : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
2.2.2 The graph type : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
2.2.3 The strategy type : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10
2.3 Shortest path tree algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
2.3.1 Shortest path trees : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
2.3.2 Bellman's equations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12
2.3.3 Label-setting and label-correcting principles : : : : : : : : : : : : : : : : : : 14
2.3.4 Search strategies : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15
2.3.5 Search strategies for label-setting and label-correcting methods : : : : : : : 15
2.4 Label-correcting algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16
2.4.1 L-queue algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16
2.4.2 L-deque algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17
2.4.3 L-treshold : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17
2.5 Label-setting algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
2.5.1 Dijkstra's algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
2.5.2 Dial's algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19
iii
contents iv
vii
List of Figures
viii
1
Introduction
This chapter introduces the inverse shortest path problem. We rst present the problem's context
along with motivating its applications. We then state the formal problem and specify the under-
lying mathematical tools that we have exploited. These tools will be analysed in the chapters
that follow.
algorithms have been proposed during the last three decades to solve the shortest paths problem
(see [5, 8, 34, 43, 67, 85]).
However, models based on shortest paths do not always re
ect observations accurately. These
inaccuracies are often caused by an inadequate knowledge of the arc weights or lengths used in the
shortest path calculations. One way to overcome this diculty and to improve one's knowledge
of the arc weights is to consider the inverse problem. Solving an inverse shortest path problem
consists of nding weights associated with the arcs of a network, that are as close as possible to
a priori estimated values, and that are compatible with the observations of some shortest paths
in the network.
these costs as little as possible to ensure, on the one hand, that some given paths in the graph
are shortest paths between their origin and destination, and on the other hand, that the total
cost of shortest paths between given origins and destinations is bounded by given values.
Our inverse problem is to reconstruct the arc weights subject to the constraints described
above. It is readily observed that, as is the case in many inverse problems, the constraints do not
uniquely determine the arc weights: the reconstruction problem is underdetermined. Fortunately,
it often happens in applications that some additional a priori knowledge of the expected arc
weights is available. This additional information then provides stability and uniqueness of the
inversion (see [105], for instance). This a priori information may be obtained either from \direct"
models, for which there is no problem of uniqueness, or from the solution of a previous inverse
problem with dierent data. In order to insure the uniqueness of the solution of our inverse
shortest path problem, we force w to be as close as possible to the a priori information contained
in w.
As far as we are aware, the inverse shortest path problem has never been formally stated nor
studied in the scientic literature.
One could also modify the problem by introducing other functions of the fwi g to minimize.
These objective functions may be linear, quadratic or generally nonlinear. Investigation of these
alternatives is beyond the scope of this thesis.
As a consequence, we can write the objective function (1.3) of our problem as
1 Xm
minm 2 (wi , wi )2; (1:9)
w2R i=1
where the factor 12 is chosen for convenience.
Solving (1.9)(1.4){(1.6) needs the algorithmic framework of quadratic programming (QP).
Chapter 3 is devoted to the analysis of a particular QP method.
This chapter considers methods solving the shortest path problem. It does not cover the matter
completely and thoroughly, but presents a few algorithms with variants that are applicable in
the context of inverse shortest path applications. A particular method will be preferred for its
appealing computational complexity.
7
the shortest path problem 8
t t
t t t t
7S
7S
S S
S S
S S
w
S w
S
-
Finding a shortest path in G between two vertices of V then consists of nding a path that
P
minimizes (i;j )2p wij , where p is any path between the two vertices.
The complexity of an algorithm refers to the amount of resources required by its running com-
putation. The worst-case complexity of an algorithm is the lowest upper bound on its complexity.
We shall use the Landau's symbols to characterize algorithms' computational complexities:
A function g(n) is O(f (n)) if there exist a scalar c and an index n0 such that
jg(n)j cf (n) for all n > n0: (2:1)
in-depth classication of shortest path problems is agreed to be the Deo and Pang's taxonomy
[30]. Other general surveys can be found in [37, 95].
The next three sections aim at locating our shortest path problem within the Deo and Pang's
classication.
breadth-rst search technique is recently due to Goldfarb, Hao, and Kai [54].
Summing up the above characteristics, the graphs we are interested in for determining shortest
paths are large, sparse, oriented and non-negatively weighted.
in our case, we will not investigate in such methods. Section 2.7 of this chapter discusses
the advantages of a particular updating technique that uses the forward star representation
of a graph.
We nally point out a strategy that has been recently studied by Bertsekas in [7, 8]: the
auction strategy. An auction algorithm for nding shortest paths seems to be relatively
ecient in many cases. Discussion about computational results obtained by this technique
can be found in Section 2.6.
Let us now examine some properties of algorithms solving the shortest path problem in sparse
graphs using above-mentioned techniques.
t t
Q 1
k
Q
t t t t
Q 1
Q
k
t t t
6 Q 6 Q
Q Q
5 3
5 3
3
5 5 5
root root
Figure 2.2: A weighted graph, a shortest path tree and a shortest spanning tree.
impossible since T is a tree. Then T n faij g consists of two trees since there are no weak cycles
in T . Both trees have fewer than n vertices each, and therefore, by induction hypothesis, each
contains one less arc than the number of vertices in it. Thus, T n faij g consists of n vertices and
n , 2 arcs and hence T has exactly n , 1 arcs. 2
As a consequence, being an arboresence, a shortest path tree can be stored in a n-array
of vertex numbers, the i-th component containing the vertex number j such that the arc (j; i)
belongs to the shortest path tree. By convention, the component corresponding the source vertex
is set to 0 (remember that the source vertex has a zero in-degree). One also says that the n-array
contains the predecessor of each vertex (dierent from the root) in the arborescence.
of p is bounded below by
2l(p),1 3
X
s(v2 ) , s(src ) + 4 s(vk+1 ) , s(vk )5 + s(j ) , s(vl(p)) = s(j ): (2:5)
k=2
Hence SPT is a shortest path tree. 2
Note that the label vector s in Theorem 2.2 does not necessarily equal the vector sc dened
by Bellman's equations. The value of s(i) will match that of sc (i) if and only if s(i) + wij = s(j )
for every arc aij in SPT (see [99]).
The equations (2.4) already suggests a basic algorithm for calculating shortest path costs3,
which is commonly viewed as a \prototype" shortest path tree algorithm [49]:
Algorithm 2.1
Step 1. Initialize a tree SPT rooted at src and, for each v 2 V , set sc(v) to the cost of the path
from src to v in SPT ;
Step 2. Let aij 2 A be an arc for which sc(i) + wij , sc(j ) < 0, then adjust the vector sc by
setting sc (j ) = sc (i) + wij , and update the tree SPT by replacing the current arc incident
into vertex j by the new arc aij ;
Step 3. Repeat Step 2 until optimality conditions (2.4), which may be rewritten as
sc (i) + wij sc (j ); for all (i; j ) 2 A (2:6)
are satised.
In course of calculation, the value of sc (v ) (v 2 V ) is greater or equal to the cost of the path from
src to v in the current tree. One usually calls sc (v ) the label of vertex v .
Algorithms that label the reached vertices with their shortest path costs from the source are
called labelling algorithms. There are two conventional ways for classifying labelling algorithms:
authors like Steenbrink [102], Dial et al. [33], Deo and Pang [30] distinguish between label-
correcting and label-setting methods; more recently, Gallo and Pallottino [48, 49] rather discern
dierent search strategies (the breadth-rst search, the depth-rst search and the best-rst search)
employing precise data structures analysed by Aho et al. in [1], Tarjan in [106] and Pallottino
in [93]. In order to clarify both approaches, let us touch upon Step 2 of the above prototype
algorithm. Considering the forward star representation of sparse graphs, one realizes that it is
worth selecting vertices rather than arcs: once a vertex i has been chosen, the operations of
Step 2 are performed on all arcs aij with vertex j 2 S (i). We suppose that vertex i is selected
from a set of candidate vertices Q. With this point of view, search strategies refer to the way
i is chosen from Q in relation to the underlying data structure of Q, and label-setting or label-
correcting methods sooner refer to some properties of the vertices that are to be selected in
Q.
These precisions allow to present a variant of the prototype Algorithm 2.1 that includes the
updating of the shortest path information given by the vector pred .
3
Bellman's equations do not supply information about shortest paths themselves.
the shortest path problem 14
Algorithm 2.2
Step 1: Initializations.
Set sc (src) 0 and pred (src ) 0.
For each vertex i 2 V n fsrcg, set sc (i) +1 and pred (i) i.
Finally, set Q fsrcg.
Step 2: Selecting and updating.
Select a vertex i in Q, and set Q Q n fig.
For each vertex j 2 S (i) such that sc (i) + wij < sc (j ) do:
set sc(j ) sc(i) + wij and pred (j ) i;
if j 62 Q, then set Q Q [ fj g and update Q.
Step 3: Loop or exit.
If Q = ;, then go to Step 2.
Else exit: sc and pred contain the costs and the description of the shortest path tree rooted
at src , respectively.
The updating of the set Q in Step 2 technically depends on the data structure used to store the
candidate vertices.
Labelling methods
Label-correcting Label-setting
Queue Stack Deque Linked list Buckets Heap
Auction
Table 2.1: Search strategies for labelling methods.
Breadth-rst searches (with queues or deques) often organize the selection in Q for label-
correcting algorithms (see [48], for instance). This is due to the fact that a breadth-rst search
visits the vertices in concentric zones starting from the source. With the same idea, Gallo
and Pallottino [48] remark that depth-rst searches (with stacks) are odd in shortest path tree
algorithms since the rst updated vertex, which is in S (src ), will be selected last.
Using a best-rst strategy with the shortest path costs (sc ) as labels, the selection of Step 2 in
Algorithm 2.2 yields the vertex i of Q that is at the shortest distance from src . Once the forward
the shortest path problem 16
star S (i) is updated, i does not need to be updated any more until the end of the algorithm.
Shortest path tree algorithms using a search strategy derived from the best-rst one are then
label-setting algorithms, and are also called shortest-rst algorithms. In particular, Dijkstra
[34] originally did not use any list in his shortest-rst algorithm; Yen [117] exploits linked lists;
Denardo and Fox in [28] and Dial et al. in [33] manipulate buckets, and Johnson [67] makes use
of heaps. Buckets and heaps are structures for ordering vertices with respect to their label. They
will be described in Section 2.5 with the algorithms employing them.
Finally, as mentioned before, the auction technique used by Bertsekas [8] is of special nature.
This technique is dened and analysed in Section 2.6.
The next sections are devoted to review some shortest path tree algorithms that use above
search strategies. Computational complexities are mentioned to allow comparisons between these
algorithms.
2.4.3 L-treshold
The partitioning of Q explains the eciency of L-deque. With the same idea, Glover et al. [60]
organized the list Q as two separate queues Q0 and Q00 using a treshold parameter s. The queue
Q0 is dedicated to vertices whose label falls below the treshold parameter s. The algorithm
typically proceeds as follows: at each iteration, a vertex is selected and removed from Q0 , and
any vertex j to be added in the candidate list is inserted in Q00 . When Q0 is empty, the treshold
4
that is, drawable in 2 dimensions without arc intersections.
the shortest path problem 18
s is adjusted and Q is repartioned according the new treshold value. The procedure then goes
on until exhaustion of the candidate list Q.
This method becomes ecient once suitable values are chosen for s. As noticed by Bertsekas
[9], if s is taken to be equal to the current minimum label, the method behaves like Dijkstra's
algorithm, which is presented in the next section; if s exceeds all vertex labels, then Q00 is empty
and the algorithms reduces then to the generic label-correcting method. Appropriate treshold val-
ues have been proposed by Glover et al. in [60], and Gallo with Pallottino in [49]. When applied
to graphs with nonnegative arc weights, the worst-case computational complexity is O(mn). Al-
though this theoretical performance equals that of other label-correcting algorithms, the treshold
algorithm allows better practical performance than the other label-correcting algorithms. The
storage requirement for this method is 5n + 2m.
The proof that this algorithm is correct can be found in [57, 89]. The complexity of the
algorithm is at most O(n2) since each arc is examined only once, and its space requirement is
4n + 2m. For complete5 graphs, the complexity reduces from O(n3) (for L-queue) down to O(n2 )
(for Dijkstra's method): the n factor is the price to pay for both considering general arc weights
and detecting negative cycles. Note that, according to Johnson [66], a label-correcting variant of
Dijkstra's algorithm is able to take into account negative arc weights (provided that no negative
cycles occur in the graph).
One can easily observe that Dijkstra's algorithm allows a relative run-time reduction when
solving a one-pair (o; d) shortest path problem, the reduction amount depending on how far
d is located from o. Indeed, one shortest path is found at each iteration of the algorithm; as a
consequence, the algorithm may halt once the destination vertex d has been permanently labelled.
Considering the actual graph sparsity also should reduce the run-time of the algorithm.
The critical operation in Dijkstra's algorithm is that of nding the vertex with smallest label
in Q. Maintaining Q ordered then appears as a reasonable approach. Yen suggested a variant
of Dijkstra's algorithm using an ordered linked list for sequencing vertices. The computational
complexity of that variant remains O(n2) on sparse graphs. As noticed in [14], ordered linked
lists do not speed up the original algorithm signicantly, since inserting a new element in Q or
modifying a vertex label requires the complete scanning of Q.
k=1
1
, @
, @
, @
k=2 2 5 k=3
JJ
3
k=4
J
k=5
2 6
k=6
Figure 2.3: A binary heap.
The binary heap Q then has dlog(K +1)e levels in its arborescence, where log is the logarithm
base 2, and dxe is the smallest integer greater than or equal to x. As a consequence, operations
of removing the minimum label item, inserting a new item, and correcting a label have a compu-
tational complexity of O(log K ). Indeed, each heap manipulation concerning one item consists of
exchanging that item either with its ascendant (one level up) or with one of its descendant (one
level down). The procedure of recovering the heap properties after some change about a vertex
or a label will be referred to as \order the heap". The heap Q can be implemented by means of
two n-vectors: one for the arborescence, and one for keeping track of the items' position in the
heap.
We now describe a shortest path tree algorithm using a binary heap to manage the list
of candidate vertices, which is still denoted by Q. The heap will contain at most n items or
vertices. As for previous algorithms, Qh (= Q1 ) and Qt denote the head and the tail of the
heap, respectively. Note that index t n. For technical details concerning the heap updating
see [49, 107]. Algorithms using related techniques have been developed by D.B. Johnson [67] and
E.L. Johnson [68].
Algorithm 2.7
Step 1: Initializations.
Set sc (src) 0 and pred (src ) 0.
For each vertex i 2 V n fsrcg, set sc (i) +1 and pred (i) i.
Finally, set Q fsrcg.
Step 2: Selecting and updating.
Set i Qh .
Replace Qh by Qt and order the heap Q.
For each vertex j 2 S (i) such that sc (i) + wij < sc (j ), do:
set sc(j ) sc(i) + wij and pred (j ) i;
if j 62 Q then insert j as Qt and order the heap Q.
the shortest path problem 22
backward along the current path; each time a backtracking occurs, the person evaluates and
keeps track of the \price" or the \desirability" of revisiting and advancing from the left position.
An iteration consists of updating the pair (P; ) so that it satises the complementarity
slackness condition (CS):
i wij + j for all (i; j ) 2 A (2:9a)
dst
4
2 ,, @ 2
, @
I@
, , @ @
2 3
@ @I
@ ,
,
,
1 @@ ,
, 2
1
src
Moreover, note that the shortest path cost to a vertex is found at the rst time the vertex becomes
the terminal vertex of the path P and is then equal to src , and that the vertices become terminal
for the rst time in the order of their proximity to the origin.
Some properties allow to state relationships between the prices i and the shortest path costs
sc (i); remembering the convention that sc (src) = 0, Properties 2 and 3 imply that
sc (j ) src , j for all j 2 V (2:13)
and
sc (i) = src , i for all i 2 P (2:14)
if the CS conditions hold. Then, we can write the following:
sc (i) + i , dst sc (j ) + j , dst for all i 2 P and j 2 V : (2:15)
The price i being an estimate of the shortest path cost from i to dst (see Property 1), the
quantity sc (j ) + j , dst as an estimate of the shortest path cost from src to dst using only
paths passing through j . It thus makes sense to consider vertex j as \most desirable" for inclusion
in the algorithm's path if sc (j ) + j , dst is minimal.
1
1
- 2
1
- 3
W
- 5
src dst
@
I ,
@ ,1
1 @
,
4
then by increments of 3 (the cost of the cycle) as many times as necessary for the price 3 to
reach or exceed W . If this situation is unlikely to arise in randomly generated problems, it is not
the case for urban networks (for instance): just think of a roundabout followed by a long road.
On the other hand, Jonhson's algorithm acts the same whatever the value of W can be, since it
will terminate in n , 1 iterations.
The storage requirement is 3n +2m or at most 5n +2m depending on wether a data structure
is used to retrieve the minimum value min(i;j )2A fwij + j g with i the terminal vertex of P .
It seems dicult to make use of a heap in the auction algorithm, since at each iteration this
minimum value may be calculated on a dierent set of arcs. The computational eort of building
a new heap \from scratch" each time P 's terminal vertex changes would degrade the algorithm's
performance.
120
100
80
60
40
20
Dijkstra
Johnson
Bertsekas, although very powerful in many cases, is relatively sensible to the presence of cycles
with small length and is not best suited for sparse graphs. Finally, Florian's allows better run-
times when one can prot by an already calculated shortest path tree whose root is close to the
new one.
The general algorithm that is best suited to inverse shortest path applications then seems to be
Dijkstra's method implemented with a binary heap. It has an attractive practical performance,
requires few storage, and takes advantage of the sparsity in a very ecient way. If available,
Florian's algorithm may be helpful when combined to this choice.
3
Quadratic Programming
In Chapter 1, we introduced the inverse shortest path problem and decided on a particular
algorithmic framework for solving the problem by choosing the `2 or least squares norm. The
resultant problem's formulation determined a quadratic programming problem, or QP for short.
This chapter deals with the theoretical background of such programs, and discusses the selection
of a method well suited for treating our inverse problem.
is called the feasible region. The feasible region may be empty. As dened in [11], a constraint
Eq(x) is trivial or redundant if, and only if, there exist no vector x 2 Rm such that Ei(x) 0
(i 2 f1; : : :; hg n fq g), and Eq (x) < 0. Note that this denition pertains when there is no vector
x such that Ei(x) 0 (i 2 f1; : : :; hg n fq g); the feasible region is then empty even without the
q-th constraint which cannot limit the feasible region any more.
We will say that degeneracy occurs when the solution to the QP problem is the same, whether
some binding inequality constraint is imposed or not. It means that we can disregard the
(in)equality, solve the problem, and nd a solution which exactly satises the (in)equality (see
[12] for further details).
3.1.2 Convexity
The concept of convexity concerns both feasible region F and objective function f (x).
The feasible region F of our quadratic program, determined by (3.2), is characterized by the
following theorem|which can easily be proved, since each of the h constraints (3.2) limits the
feasible region to a halfspace.
Theorem 3.1 The feasible region F of a quadratic program is convex, i.e.,
x; y 2 F ) x + (1 , )y 2 F ; (3:3)
for all 0 1.
An extreme point of the convex set F is a point that does not lie strictly within the line segment
connecting two other points of the set. More formally,
Definition 1 A point x in a convex set F is said to be an extreme point of F if there are no
two distinct points x1 and x2 in F such that
x = x1 + (1 , )x2 (3:4)
for some 0 < < 1.
Note that a singleton and the entire space Rm are both convex, the rst one is bounded and the
second one is unbounded.
Definition 2 A function f : D( Rm ) ! R is convex if the set D is convex, and if the following
inequality holds for any pair of points x1, x2 2 D and any real number 0 1:
f ((1 , )x1 + x2) (1 , )f (x1) + f (x2): (3:5)
If the sign in (3.5) is replaced by < and 6= 0; 1, the function f is said to be strictly convex.
Geometrically, if f is strictly convex and continuously dierentiable, a line segment drawn between
any two points on its graph falls entirely above the graph. Such a function increases more rapidly
(or decreases less rapidly) than a straight line:
f (x2) > f (x1) + rf T (x1):(x2 , x1 ) for all x1 ; x2 2 D; (3:6)
quadratic programming 32
where r designates the Gradient (or rst derivative) of the next argument.
Now let us come back to the objective function f of our QP. Clearly, f is twice continuously
dierentiable and the second derivative of f , its Hessian, is G an m m matrix whose components
are second partial derivatives. Remember that G is symmetric.
Definition 3 An m m matrix G is said to be positive semi-denite if for all x 2 Rm , xT Gx
0.
Definition 4 An m m matrix G is said to be positive denite if for all x 6= 0 2 Rm , xT Gx > 0.
These denitions are signicant in quadratic programming for their relationship with the
convexity of f and hence with the characterization of a solution to the QP problem. The two
following theorems state these connections.
Theorem 3.2 Let G be an mm symmetric matrix. The quadratic function f (x) = aT x+ 12 xT Gx
is convex on Rm if and only if G is positive semi-denite.
A proof of this theorem can be found in [113].
If G is positive denite, then the function f (x) = aT x + 12 xT Gx is strictly convex on Rm .
These properties may aect the quality of a minimizer of the QP problem (3.1){(3.2):
Theorem 3.3 Assume that the feasible region F determined by (3.2) is not empty. If G is
positive semi-denite, a solution x of the QP problem (3.1){(3.2) is a global solution, that is,
f (x) f (x); for all x 2 F : (3:7)
Moreover, if G is positive denite, then x is also unique, that is,
f (x) < f (x); for all x 6= x 2 F : (3:8)
See [12] for a proof of this theorem.
When the Hessian G is indenite then local solutions which are not global can occur2 . The
problem of minimizing a convex quadratic function f (x) on a convex feasible domain F Rm is
called convex quadratic programming.
Explicit constraints
An explicit shortest path constraint has been stated in Chapter 1 as follows:
pj is a shortest path in G; j = 1; : : :; nE ; (3:10)
where G is an oriented weighted graph (V ; A; w), consisting of a set V of n vertices, a set A of
m arcs and an m-vector w of weights associated with the arcs. The paths pj (j = 1; : : :; nE ) are
dened as an explicit succession of consecutive arcs in A. Note that we are using the terminology
and denitions presented in Chapter 2.
The formulation (3.10) asks the cost of pj not to exceed that of any path with the same
origin and destination as pj . This may be expressed as a (possibly large) set of vectorial4 linear
inequality constraints of the type
X X
wk wk ; (3:11)
kjak 2p0j kjak 2pj
where p0j is any path with the same origin and destination as pj . As a consequence, the set of
feasible weights determined by (3.10) is convex as it is the intersection of a collection of half
spaces. The problem of minimizing (1.9) subject to (3.11) for j = 1; : : :; nE , and to the non-
negativity constraints (1.4) is then a classical QP problem. This QP is however quite special
because its constraint set is (potentially) very large, very structured, and possibly involves a
nonnegligible amount of redundancy. Indeed, the number of linear constraints of the form (3.11)
3
We refer here to algorithm complexities in terms of polynomial or non polynomial run-times. These notions
will be introduced later.
4
That is, 0 belongs to the subset of points verifying the inequality as an equality.
quadratic programming 34
is dependent on the number of possible paths between two vertices in the graph, which typically
grows exponentionally with the density of the graph m=n. As all paths are taken into account
between an origin and a destination, a lot of constraints (3.11) are trivial once only few ones are
suitably considered; indeed, at most m constraints are not trivial since they are vectorial and
w 0. There exist procedures that eliminates such trivial or redundant constraints, allowing
to start the problem's resolution with fewer constraints (see Boot's procedure in [11, 12], for
instance). However, these checks for triviality require the enumeration of all constraints. In our
case, enumerating an exponential number of constraints is of course out of question5 , and we will
have to use a \separation procedure" to determine which of these constraints are violated for
a given value of the arc weights. This separation is naturally based on the computation of the
shortest paths within the graph.
Implicit constraints
An implicit shortest path constraint restricts some attribute of a shortest path between an origin
and a destination without suggesting the path that has to be taken from the origin to the destina-
tion. We therefore say that the shortest path is implicitly determined by its origin, its destination,
the oriented graph and the current value of the arc weights w. Typically, the restricted attribute
is the cost of the shortest path since our variables are the arc weights. We consider bound con-
straints on this shortest cost. We assume that nI origin-destination pairs (oj ; dj ); j = 1; : : :; nI
are concerned with this constraint type, and we formulate the corresponding implicit constraints
as follows: X
0 lj wa uj ; j = 1; : : :; nI ; (3:12)
a2p1j (w)
where p1j (w) is a shortest path (with respect to the weights w) from oj to dj . The values of lj
and uj are lower and upper bounds on the cost of the shortest path from oj to dj , respectively.
For consistency, we impose that lj uj (j = 1; : : :; nI ) and we allow lj to be chosen as zero and
uj to be innite.
One constraint (3.12) actually consists of two inequality constraints: one for the lower bound
part and one for the upper bound part. Due to the meaning of the shortest path principle, lower
and upper bounds on shortest path costs have very dierent interpretations.
Let us consider the j -th origin-destination pair. Since p1j (w) is a path of minimum cost
P
between oj and dj , imposing that a2pj (w) wa lj means that all paths from oj to dj must have
1
the path p0j being any path from oj to dj . These constraints are linear and ane6, and their
number is exponential with the density of the graph m=n, as for the depiction of an explicit
5
This procedure would be not polynomially bounded.
6
That is, 0 does not belong to the manifold verifying (3.13) as an equality, when lj 6= 0.
quadratic programming 35
shortest path constraint. The feasible region delimited by lower bounds on shortest path costs
is then convex. Again, much redundancy can be expected in the set of constraints (3.13). As
a consequence, these constraints can be part of our inverse problem without aecting the global
nature of a solution to this problem.
The underlying interpretation of an upper bound on the shortest path cost is fundamentally
dierent: the j -th upper bound constraint dened by
X
wa uj (3:14)
a2p1j (w)
does not compel all paths from oj to dj to have a cost under uj , but it just imposes that there exists
one path from oj to dj whose cost does not overstep the upper bound uj . The (a priori) unknown
path that must follow that condition has to be picked up among the exponential number of paths
starting from oj and arriving at dj . A shortest path procedure will determine an appropriate path
which will complete the constraint denition in order to check whether the constraint is violated
or not. However, the path that is to be selected for evaluating the constraint violation may vary
with the arc weights w. The path p1j (w) then remains explicitely unidentied. Consequently the
constraint (3.14) cannot be expressed as one or more linear constraints, and hence cannot t into
the classical QP framework dened by (3.1){(3.2).
Moreover, the feasible region determined by constraints of the type (3.14) is non-convex.
Let us show it in this small example: consider the following graph, composed of 3 vertices and 3
arcs (m = 3), shown in Figure 3.1, and consider the constraint
t
o t
a1
,
,
, @ a2
,
@
@
R
@
-
a3
t d
X
wa 5; (3:15)
a2p1 (w)
where p1 (w) is the shortest path (with respect to the weights w) from vertex o to vertex d. It
is easy to see that w1 = (2 2 10)T and w2 = (10 10 4)T are feasible weight vectors, while
1 (w1 + w2) = (6 6 7)T is infeasible. The feasible region is therefore non-convex.
2
variable:
X
m
f (w ) = fi (wi) (3:16)
i=1
where fi (w) = 12 (wi , wi )2, (i = 1; : : :; m). This property is sometimes eciently used to speed
up the solving procedure.
One common characteristic that is shared by all the constraints (1.4){(1.6) is that they are
sparse. The constraints involve the weight of only very few arcs since they translate either the
non-negativity of an arc weight, or properties of a (shortest) path which is not eulerian7 in large
graphs.
Computing the function F requires the computation standard model given by the Turing
machine. We do not want to go into details, which can be found in [50, 113]. We just specify that
the action of a Turing machine is deterministic, that is, cannot make choices. A Turing machine
is said to compute function F : C ! B if, given an instance x of C , it eventually yields F (x) and
halts.
2. For every x such that F (x) = no and for every instance 2 , (x; ) = no.
A precision is usually added indicating that, in Item 1, the length of is bounded by the value
of p(length of x). This denition means that every yes-instance has a certicate of polynomial
length, and that there exists a Turing machine that can check the certicates in polynomial time.
In the optimization framework, a problem is in NP if one can verify in polynomial time whether
a given point is a solution of that problem or not. Many combinatorial problems are in NP since
verifying is usually simple for that kind of problems.
As a direct consequence, remark that P NP.
Primal methods
A primal method works on the original problem directly by searching through the feasible region
for the optimal solution. Each point in the process is (primal) feasible and the value of the
objective function constantly decreases. Such methods benet from the following advantages:
Most primal methods do not rely on special problem structure, such as convexity, and hence
apply to a wide-ranging class of problems.
quadratic programming 39
Since each generated point is feasible, if the process halts before reaching the solution, the
nal point is feasible and may represent an acceptable approximation to the solution of the
original problem.
These methods however present major drawbacks. They require an initial procedure to start
from an initial feasible point. Diculties also come from the need to maintain this feasibility
throughout the process. As noticed by Luenberger in [81], some methods can fail to converge for
problems with inequality constraints unless elaborate precautions are taken, but they generally
have good convergence rates, particularly with linear constraints.
A primal method solving our QP problem (3.1){(3.2) gives a point x that satises the
following conditions called Kuhn-Tucker conditions [73]: there exist real numbers ui 0, (i =
1; : : :; h) such that
Xh
rf (x) , uirEi(x) = 0 (3:17a)
i=1
and
uiEi (x) = 0for all i = 1; : : :; h: (3:17b)
The vector u is called the vector of Kuhn and Tucker multipliers or Lagrange multipliers. The
conditions (3.17a){(3.17b) geometrically mean that the gradient of f at x lies in the normal
cone dened at x, that is, can be expressed as a linear combination of the inward normals to
the binding (or active) constraints at x. Note that in case of non-degeneracy9 , the Kuhn and
Tucker multipliers are unique.
Just for reference, famous primal methods are the gradient projection methods and the re-
duced gradient method. Both basic methods can be viewed as the method of steepest descent
applied on the manifold dened by the binding or active constraints. In linear programming,
the simplex method is well-known for travelling through extremal points of the convex feasible
region.
Dual methods
A dual method does not tackle the original constrained problem directly but instead considers
an alternate problem, the dual problem, whose unknowns are the Lagrange multipliers of the
rst problem. Lagrange multipliers, for convenience denoted by u 2 Rh , in a sense measure the
sensivity of the constraints as they appear in the following function, the dual objective function
which is usually called the Lagrange function or Lagrangian:
X
h
L(x; u) def
= f (x) , ui Ei(x) (3:18)
i=1
where L(x; u) is the Lagrange function of the QP problem dened by (3.1){(3.2). For a problem
with m variables and h constraints, dual methods thus work in the h-dimensional space of the
9
When rEi (x ), with i such that Ei (x ) = 0, are linearly independent.
quadratic programming 40
Lagrange multipliers u, and solve the dual problem which is formulated as follows.
max d(u) (3:19)
u2Rh
subject to
u 0; (3:20)
where
d(u) = xmin
2Rm
L(x; u): (3:21)
For our convex QP problem, it can be shown that d(u) is concave (any local maximum is then
global), since d(u) = , 12 uT (N T G,1 N )u + uT (b + N T G,1a) , 12 aT G,1 a, where N is the m h
matrix whose i-th column is ni . The non-negativity of the Lagrange multipliers (3.20) is due
to the inequality constraints (3.2) of the original or primal problem; equality constraints would
leave their Lagrange multiplier unconstrained in sign. Once these multipliers are known, one
must determine the solution point in the space of primal variables x, such that Gx = Nu , a,
in order to supply the desired solution of the QP problem. A method solving the above dual
problem actually searches for a saddle-point (x; u) of the Lagrange function L(x; u), that is, a
point such that
L(x; u) L(x; u) for all x 2 Rm (3:22a)
and
L(x; u) L(x; u) for all u 0: (3:22b)
If such a point is found, then x is a global optimum of the primal problem.
Dual methods oer the following attractive features:
Lagrange multipliers have meaningful intuitive interpretations as prices associated with the
constraints, in the context of practical applications.
Dual methods do not require to start from an initial primal feasible point.
Global convergence of dual methods is often guaranteed.
The eciency of dual methods however heavily relies on the convexity of the problem. Dual
procedures also have the disadvantage of supplying a primal feasible solution only when they
terminate.
Primal-dual relation
For primal and dual feasible solutions, x and u, we have that d(u) f (x). Under dierentia-
bility conditions, optimal points of the primal and dual problem yield the same primal and dual
objective function values
f (x ) = d(u): (3:23)
In our case, the objective function f (x) and the constraints Ei(x) are convex and dierentiable.
Then, x is a global minimum if and only if the Kuhn and Tucker conditions are satised. This
result is stated in the theorem that follows.
quadratic programming 41
Theorem 3.5 Provided that there exists x 2 Rm such that Ei (x) > 0 for i = 1; : : :; h (the Slater
condition10), the Kuhn and Tucker conditions are necessary and sucient conditions to obtain
a global optimum x to the QP problem dened by (3.1) and (3.2) where f (x) is convex: there
exists u 0 verifying
rxL(x; u) = 0 (3:24a)
and
ui Ei(x ) = 0; for all i = 1; : : :; h: (3:24b)
The notation rx indicates the partial derivative with respect to the variable x.
A proof of this theorem can be found in [82].
Simplex-type methods
Quadratic programming methods that wear the \simplex label" have one or more of the following
properties. They use simplex tableaux; they perform Gauss elimination type pivots on basis
matrices that are derived from the Kuhn-Tucker optimality conditions (3.24a){(3.24b); they
nally may reduce to the simplex method for linear programming for the degenerate case of G = 0.
They roughly consist of a generalization of the simplex method for quadratic programming.
These methods are inappropriate to our inverse problem since the pivot operations are per-
formed on matrices of row size (m + h). The number of our constraints h = nE being typically
exponential, the simplex-type methods would require too large matrices for operating correctly.
See [27] for further details concerning such methods.
active constraints as equalities, that is nTi x(k) = bi for i 2 A. Moreover, apart from degenerate
cases, nTi x(k) > bi for i 62 A. Each iteration thus attempts to locate the solution x(k) to an
equality constrained problem in which only the active constraints occur. To the solution x(k)
corresponds a vector of Lagrange multipliers u(k) for the active constraints in A. The vectors
x(k) and u(k) then verify the Kuhn and Tucker conditions possibly without primal or dual
feasibility depending upon whether the active set technique is combined with a dual or primal
method, respectively. Some constraints thus need to be added and/or removed from the active
set, and a new subproblem then comes under consideration. This is repeated until validation of
the Kuhn-Tucker conditions for the original problem. Technical details about active sets can be
found namely in [41, Chapter 10].
Goldfarb showed in [53] that Beale's method, which was developed as an extension of linear
programming, can be viewed as an active set method. This is corroborated in [75] where Lemke
nds his method very close to that of Beale. Fletcher [40] writes similar comments about Dantzig's
method, and Van de Panne and Whinston [110] showed that Beale's method and Dantzig's method
generate the same sequence of points if they both start from the same initial point. This brings
forward the fact that QP methods apparently originate at the same basic idea, and that they
dier by the relative point of view with which they have been developed. The relative merits of
QP methods then show up through extensive computational experience. Indeed, remember that
the storage computational aspect invited us to prefer active set methods against simplex-type
ones. For experience results about the above algorithms, see those obtained by Fletcher [40] and
Goldfarb [53].
Later, Goncalves [58] proposed a primal-dual algorithm based on simplex techniques, and
Stoer [103] developed a method for constrained least-squares. Again, both approaches carry out
Gauss elimination pivots on potentially large matrices. Gill and Murray [52] set up a primal
method using an active set and QR-factorizations on the matrix formed by the normals to the
actives constraints. Their method applies to indenite QP problems, and has the advantage to be
numerically stable, but it needs an initial primal feasible point. The search for an initial feasible
point can be avoided by an alternative approach proposed by Conn [22] minimizing a penalty
function. This modication allows the iterates to be not feasible. According to Gill and Murray
[52], results produced by Conn and Sinclair [23] do not allow rm conclusions.
A more recent method for convex QP is suggested by Goldfarb and Idnani [55]. It can be
viewed as a dual active set method. They use the idea of Theil and Van de Panne of starting from
the unconstrained minimizer of the quadratic function, and factorize the matrix of the normals
to the active constraints by similar techniques as those employed by Gill and Murray. The fac-
torization techniques bring numerical stability to the procedure. The Goldfarb and Idnani (GI)
method already seems to gather the advantages of prior methods, that are appropriate to the
solving of large convex QP problems involving redundancy. Indeed, following Fletcher's appreci-
ation [41], the method is most eective when there are only few active constraints at the solution,
and is also able to take advantage of a good estimate of the active set A at the solution. This
latter advantage makes the GI method suitable for sequential quadratic programming methods
for nonlinearly constrained optimization calculations. Powell [96] analyzed the Goldfarb and Id-
nani method in the special case where the Hessian G is ill-conditioned due to a tiny eigenvalue.
Powell's conclusions cast some doubt on the numerical stability of Goldfarb and Idnani's imple-
mentation. Powell proposed in [97] a stable implementation of the GI method that circumvents
these diculties. Note that our Hessian G = I , the identity matrix, does not enter the scope of
Powell's improvements.
The GI method suitably meets the requirements for solving our convex QP problem. The
next section is devoted to its analysis.
quadratic programming 44
Step e. If all constraints are satised, then exit: x is the optimal solution to CQP.
Else, go to Step b.
Note that, in Step b, the index q belongs to f1; : : :; hg n A.
We need some more notations to describe the algorithm more formally. Let jAj be the number
of constraints in A, and N be the m jAj matrix whose columns are the normals ni of the
constraints in the active set A. The algorithm will use two additional matrices when the columns
of N are linearly independent:
N def
= (N T G,1N ),1 N T G,1 ; (3:25)
which is the Moore-Penrose generalized inverse of N in the space of variables under the transfor-
mation y = G1=2x, and, if I is the m m identity matrix,
H def
= G,1 (I , NN ); (3:26)
which is the reduced inverse Hessian of the quadratic objective function in the subspace of points
satisfying the active constraints. Indeed, since NN is the operator of the projection along the
subspace of points satisfying the active constraints, (I , NN ) is a (generally non-orthogonal)
projection onto the manifold verifying the active constraints. Note that H is symmetric. The
operators N and H satisfy the following properties.
Property 4 Hw = 0 , w = N, with 2 RjAj .
Proof. The lefthand side equals N , N NN where N N equals the Identity matrix. 2
Let us denote the feasible region of P (A) by F (A), where
F (A) def
= fx 2 Rm j nTi x = bi; i 2 Ag; (3:28)
and the Gradient of the objective f at x by g (x) rf (x) = Gx + a. Then we can state the
following theorem.
quadratic programming 46
is the desired solution. Otherwise, a violated constraint is chosen, that is, an index q is
selected in f1; : : :; hg such that Eq (x) < 0. Also set
8 !
>
< u if jAj > 0;
u+ > 0 (3:36)
: 0 if jAj = 0:
step 2: Compute the primal and dual step directions.
These directions are computed by the relations
s = Hnq (3:37)
and, if jAj > 0,
r = N nq : (3:38)
step 3: Determine the maximum steplength to preserve dual feasibility.
Dene
S = fj 2 f1; : : :; jAjg j rj > 0g: (3:39)
The maximal steplength that will preserve dual feasibility is then given by
8
>
< ur` = minj2S urjj if S 6= ;;
+ +
tf = > ` (3:40)
: +1 otherwise.
step 4: Determine the steplength to satisfy the q-th constraint.
6 0, and is then given by
This steplength is only dened when s =
tc = , EsTq (nx) : (3:41)
q
f f + t( 12 t + u+jAj+1)sT nq (3:45)
and 8 !
>
< , r
+
u +t if jAj > 0;
u+ > 1 (3:46)
: u+ + t if jAj = 0:
If t = tc , then set u u+ , add constraint q , that is A A [ fq g, and go back to step 1
after updating H and N . If, on the other hand, t = tf , drop the `-th constraint, that is
A A n f`g and go back to step 2 after updating H and N .
Note that u, the vector of Lagrange multipliers, has a dimension equal to the number of active
constraints.
We observe that the GI algorithm involves three types of possible iterations.
1. The rst is when the new violated constraint is linearly independent from those already in
the active set, and all the active constraints remain active at the new solution of the QP
subject to the augmented set of constraints. This occurs when t = tc .
2. The second is when the new violated constraint is linearly dependent on those already in
the active set. This occurs when s = 0, or, equivalently, when Nr = nq . In order to preserve
independence of the active set (that is, linear independence of the columns of N ), an old
constraint (the `-th) is dropped from the set before incorporating the new one. As a result,
N is always of full column rank.
3. The third is when the solution of the QP subject to the augmented set of constraints is
such that one of these constraints is not binding. This occurs when t = tf , in which case
the `-th constraint ceases to be binding. As one wishes to keep only binding constraints in
the active set, this constraint is dropped.
An ecient implementation of this algorithm does not need to explicitely compute and store
the operators N and H that are used by the algorithm. One can store and update the matrices
J = QT L,1 and R obtained from the Cholesky and QR factorizations G = LLT and L,1 N =
Q [R0 ]. We will bring precisions about these factorizations when, in Chapter 4, we specialize the
GI method for our inverse shortest path problem.
Before examining the algorithm's properties, we need to introduce some more notations. The
set A+ denotes the set A [ fq g where q 2 f1; : : :; hg n A is the index of the constraint that is to
be added in the active set. Similarly, A, refers to a subset of A containing one fewer element
than A. Accordingly, N + and N , are the matrices of inward normals corresponding to A+ and
A, , respectively. The normal n+ indicates the normal vector nq added to N to give N + and n,
is the column removed from N to give N , . In agreement, (N + ) and H + denote the operators
dened in (3.25) and (3.26), respectively, with N + instead of N . Finally, the m-vector ei is the
i-th column of the identity matrix I .
The next two sections discuss the cases where the columns of N + are linearly independent or
not.
quadratic programming 49
Proof. We rst need to establish some results before proving the above equations:
g(x) = G(x + ts) + a = g (x) + tGs; (3:58)
from (3.52) and (3.56), we can write Gs as
!
Gs = (I , NN )n+ = n+ , Nr = N + ,r ; (3:59)
1
H +N + = 0; (3:60)
because (I , N + (N +) ) is a projection along the subspace spanned by the columns of N + ;
(N +) N + = I; (3:61)
since nq is linearly independent from the ni indexed by A; nally,
nTi Hn+ = (Hni )T n+ = 0; for all i 2 A; (3:62)
because (I , NN )ni = 0 for i 2 A.
In order to show (3.53), we use successively (3.51), (3.58){(3.60) and (3.49):
!
H + g(x) = H + g (x) +tH + Gs = t H + N + ,r : (3:63)
| {z } | {z } 1
0 0
Now, let us prove that x belongs to F (A):
Ei (x) = nTi (x + ts) , bi = E +tnTi s = t n| Ti Hn
| i{z(x)}
+
{z } = 0 for all i 2 A: (3:64)
0 by (3.48) 0 by (3.62)
The modication of the Lagrange multipliers is as follows:
u+ (x) (N +) g(x) + t(N +)Gs ! by (3.58)
= (N +) g (x) + t (|N +{z)N +} , r by (3.59)
1 (3:65)
!I
= u+ (x) + t ,r by (3.50) and (3.61).
1
Finally, the evaluation (3.57) of Eq (x) is established by a similar development as that in (3.64).
2.
By Lemma 3.7 we can determine the point xc = x + tc s that minimizes f over F (A+ ): it is
the point such that Eq (xc ) = 0, which implies that tc = ,Eq (x)=sT nq (if sT nq 6= 0). Moreover,
(xc ; A+) will be an S-pair if u+ (xc ) 0. If not, then (3.55) allows a smaller value tf of the
steplength t such that tf < tc and that some u+i (xf ) becomes negative, where xf = x + tf s. The
constraint, say ` 2 A, corresponding to that i-th component is dropped from the active set and
(xf ; A, ; q ) satises the conditions to be a V-triple, where A, = A n f`g. This is formally stated
by the following theorem.
quadratic programming 51
Theorem 3.8 Let (x; A; q ) be a V-triple and x be dened as in Lemma 3.7 with
t = minftc ; tf g; (3:66)
where
tc = , EsTq (nx+) (3:67)
and 8
>
< u`r(x) = minj2S ujr(jx) if S 6= ;;
+ +
tf = > ` (3:68)
: +1 otherwise,
where S = fj 2 f1; : : :; jAjg j rj > 0g. The multipliers u+ (x) and r are given by (3.55) and
(3.56) respectively. Then, we have that
Eq (x) Eq (x) (3:69)
and we observe the following increase of the objective function f
f (x) , f (x) = tsT n+ ( 12 t + ujAj+1(x)) 0: (3:70)
If t = tc , then (x; A [ fq g is an S-pair, and if t = tf , then (x; A n f`g; q ) is a V-triple.
In the denition (3.68) of tf , we abused the notation of the index `, which is actually the index
of the constraint as dened in CQP (1 ` h), and not its index j (`) in the vector u+ (x) where
1 j (`) jAj. For the sake of simplicity, let ` refer to the dropped constraint in either cases.
Let us now prove the above theorem.
Proof. Let us rst note that s = G,1(I , NN )n+ is 6= 0 since the linear independence of n+
from the columns of N makes n+ not belong to the null space of (I , NN ), and G,1 is positive
denite. Then, since (x; A; q ) is a V-triple, we can write
= 5 (n+ )T HGHn+ (3.52)
sT n+ = (n+)T Hn+ Prop. = sT Gs > 0; (3:71)
because G is positive denite and s 6= 0. As a consequence, t 0 and by (3.57), Eq (x) =
Eq(x) + t |sT{zn+} Eq (x). Remark that when t = tc , Eq (x) > Eq (x) since tc > 0 from (3.47) and
>0
(3.67).
On the other hand, using Taylor's formula on f with x , x = ts, one has that
f (x) , f (x) = tsT g (x) + 12 t2 sT Gs: (3:72)
By Property 4, H + g (x) = 0 implies that g (x) = N + u+ (x). It then follows that Hg (x) =
HN +u+ (x) = Hn+u+jAj+1 (x), since H projects along the manifold spanned by the columns of
N (the non-zero contribution remains that of the (jAj + 1)-th column of N + , which is n+ ).
Consequently,
sT g(x) = (n+ )T Hg (x) = (n+)T Hn+u+jAj+1(x) 0 (3:73)
quadratic programming 52
by (3.71) and (3.55). Substituting (3.71) and (3.73) into (3.72), gives (3.70). Moreover as long
as t > 0, f (x) > f (x).
Lemma 3.7 and the denition of t (3.66){(3.68) ensure that x is primal optimal for P (A+ )
(H +g (x) = 0), primal feasible for P (A) (Ei(x) = 0 for i 2 A), and that u+ (x) is dual feasible
( 0). If t = tc , then Eq (x) = 0, x is primal feasible for P (A+ ) and (x; A [ fq g) is an S-pair.
We then have performed a full step in the primal space. If t = tf < tc , then Eq (x) < 0 and
u+` (x) = 0. Since H + g(x) = 0, the latter equation implies that
X
g (x) = N +u+ (x) = u+j(i) (x)ni ; (3:74)
i2A[fqgnf`g
where i is the j (i)-th index in A+ . Consequently, (x; A n f`g; q ) is a V-triple since the set of
normals fni j i 2 A [ fq g n f`gg is of course linearly independent. We then have performed a
partial step in the primal space. 2
The above theorem allows to obtain an S-pair (x; A[fq g) from a V-triple (x; A; q ) with A A,
such that f (x) > f (x). This is achieved after jAj , jAj partial steps (this number is jAj) or
one full step.
Note that in (3.81){(3.82), we distinguish between the `-th constraint and its index j (`) in the
active set.
Now, if we dene A^ = A, [ fq g, then N^ has full rank, that is,
X
ini = 0 (3:83)
i2A^
implies i = 0 for i 2 A^. Indeed, suppose that (3.83) holds. Then, by (3.76), we can write
X X
ini + q rini = 0; (3:84)
i2A, i2A
that is, X
(i + q ri)ni + q r` n` = 0: (3:85)
i2A,
quadratic programming 54
Since fni j i 2 Ag is linearly independent, we deduce from (3.85) that i + q ri = 0 for all i 2 A, ,
as well as q r` = 0. We know that r` > 0. Thus, q = 0 and hence i = 0 for i 2 A, . The
matrix N^ then has full rank.
It then follows from (3.82) and Property 4 that Hg ^ (x) = 0, and
(
u^(x) N^ g(x) = uj(i) , ur`` rj(i) 0; for i 2 A, ; (3:86)
u` 0;
r`
since N^ ni = ei for i 2 A, . This establishes that (x; A,; q ) is a V-triple. We then have performed
a dual step. 2
Note that the change that occurs to the active set A and to the dual variables in the partial
step described in Theorem 3.8 is the same as that performed in the dual step described just
above. The only dierence is that x is not changed in the dual step (there is no step in the primal
space) while this primal modication generally occurs in a partial step. A dual step emphasizes
that when degeneracy occurs, it is possible to take non-trivial steps in the space of dual variables
without changing x and f (x).
In this chapter, the basic inverse shortest path problem is considered where the constraints are
given as a set of shortest paths and nonnegativity constraints on the weights. We introduce
the concept of \island" in order to characterize the violation of shortest path constraints. The
violation of an explicit shortest path constraint creates one or more island(s): an island is made
of two \shores"; the rst shore indicates a portion of the computed shortest path and the second
shore is the succession of arcs that the path should (but does not) follow; both shores have
common termination vertices. In order to follow Goldfarb and Idnani's method framework, we
establish specialized formulations of the primal and dual step directions, the update of the arc
weights, and the maximum steplength to preserve dual feasibility. These new formulations are
\island-oriented". We also provide a way to check whether the primal step direction s is zero
without computing s explicitly. A computational algorithm is then proposed. Our method is then
tested on practical large scale problems with large numbers of constraints. These tests conrm
the eciency of our method since few constraints, or islands, are added in the active set that are
not active at the solution.
The content of this chapter has been published in [15].
where l(j ) is the number of arcs in the j th path (its length), and where
t(ji) = s(ji+1) for i = 1; : : :; l(j ) , 1: (4:2)
If we dene w as the vector in the nonnegative orthant of Rm whose components are the given
initial arc weights fwig, the problem is then to determine w, a new vector of arc weights, and
hence a new weighted graph G = (V ; A; w) such that
min kw , w k (4:3)
w2Rm
is achieved under the constraints that
wi 0 (i = 1; : : :; m) (4:4)
and that the paths fpj gnj =1
E are shortest paths in G.
Remember that we decided to restrict ourselves to the `2 norm to prot by the quadratic
programming framework. As a consequence, our inverse shortest path problem became
1 X
m
min
wi 2 (wi , wi )2 (4:5)
i=1
subject to (4.4) and the nE shortest path constraints. In Chapter 3, we established that these
last constraints may be expressed as a (possibly large) set of linear constraints of the type
X X
wk wk ; (j = 1; : : :; nE ) (4:6)
kjak 2p0j kjak 2pj
where p0j is any path with the same origin and destination as pj . As a consequence, the set of
feasible weights, F say, is convex as it is the intersection of a collection of half spaces. The problem
of minimizing (4.5) subject to (4.4) and (4.6) is then a classical quadratic programming (QP)
problem. This QP is however quite special because its constraint set is (potentially) very large1 ,
very structured, and possibly involves a nonnegligible amount of redundancy. Also the problem
of minimizing (4.5) on the set F of feasible weights may be considered as the computation of a
projection of the unconstrained minimum onto the convex set F . Again, the special structure of
F distinguishes this problem from a more general projection.
An active set of constraints is maintained by the procedure, that is, a set of constraints which
are binding at the current stage of the calculation. A new violated constraint is incorporated
into this set at every iteration of the procedure (some other constraint may be dropped from
it), and the objective function value monotonically increases to reach the desired optimum. This
approach was chosen for two main reasons.
Since the Goldfarb-Idnani (GI) algorithm is a dual method, it is extremely easy to incor-
porate new constraints once a rst solution has been computed. In our context, this means
that, if a new set of prescribed shortest paths is given, modest computational eort will be
required to update the solution of the problem.
The GI method has an excellent reputation for eciency, especially in the case where the
number of constraints is large and near-degeneracy very likely. In particular, the method
avoids slow progress along very close extremal points of the constraint set F .
Also, the GI method and its ecient implementation are discussed in the literature, by Goldfarb
and Idnani in their original paper, but also by Powell in [96] and [97], for example.
Because our method heavily relies on the GI algorithm, we now state this method in its full
generality. In this form, it is designed for solving the QP problem given by
minx f (x) = aT x + 12 xT Gx;
(4:7)
subject to Ei(x) def
= nTi x , bi 0 (i = 1; : : :; h);
where x, a and fni ghi=1 belong to Rm , G is a m m symmetric positive denite matrix, b is in
Rh and the superscript T denotes the transpose. As indicated above, the GI algorithm maintains
a set of currently active constraints, A say, and relies on the matrix N whose columns are the
normals ni of the constraints in the active set A. The matrix N is thus of dimension m jAj,
where jAj is the number of constraints in A. The algorithm also uses two additional matrices,
namely
N def
= (N T G,1N ),1 N T G,1 ; (4:8)
which is the Moore-Penrose generalized inverse of N in the space of variables under the transfor-
mation y = G1=2x, and
H def
= G,1 (I , NN ); (4:9)
which is the reduced inverse Hessian of the quadratic objective function in the subspace of points
satisfying the active constraints.
We do not re-state here the GI algorithm which is given in detail in Chapter 3 (Algorithm 3.2).
We also refer the reader to Chapter 3 and [55] for further details on the general GI algorithm,
and in particular for the proof that it indeed solves the QP (4.7), provided a solution exists.
Our purpose, in the next paragraphs, is to specialize the GI algorithm to the inverse shortest
path problem given by (4.5), (4.4) and (4.6). We will therefore examine the successive stages of
the algorithm presented above, where the structure of the problem allows some renement.
solving the inverse shortest path problem 58
u
v1 -2
a1
,
,
v
a
u -3
2 ,
,
a
v
-4
3 ,
,
u vu
u u u u
a4 ,,
a8 a5 ,a9 a6 ,a10 a7
, ,
, , ,
?
, -,? -,? -?
v5 a11 v6 a12 v7 a13 v8
On the small example given in Figure 4.1, we assume that the weight vector w is given by
the relation wj = j (that is the arc aj has a weight of j ), while the constraint paths are given by
p1 = (a1 ; a5; a12; a13) and p2 = (a11 ; a12; a10): (4:11)
At this point, it is not dicult to verify that the shortest path from v1 to v8 is the path
(a1; a2; a3; a7): (4:12)
Hence a constraint related to the path p1 is violated at the vertex v8, because the predecessor
of v8 on its shortest path from v1, that is v4 , is dierent from its predecessor on the constraint
solving the inverse shortest path problem 59
path, which is v7. The vertex v above is then v8 , while inspection shows that the relevant vertex
w is v2 . The corresponding violating island is then
I = ((a2; a3; a7); (a5; a12; a13)) ; (4:13)
where I + = (a2; a3; a7) is its positive shore, I , = (a5; a12; a13) its negative shore, and whose
associated excess E is (2 + 3 + 7) , (5 + 12 + 13) = ,18. This violating island is not the only
one for this example. A second one, related to the path p2 , is given for instance by
I 0 = ((a8; a2; a3); (a11; a12; a10)) ; (4:14)
whose excess E 0 is equal to -20.
A violated constraint of the type (4.6) therefore corresponds to a violating island in (V ; A; w).
When it is incorporated in the active set, the constraint is enforced as an equality and the costs
of its negative and positive shore are exactly balanced (see section 4.2.5). The corresponding
island is then called active.
where R is an upper triangular matrix of dimension jAj. Since N is of full rank, this is equivalent
to maintaining a QR factorization of N of the form
R ! def
N = Q1 Q2 = QU; (4:18)
0
as is the case in the numerical solution of unconstrained linear least squares problems. Indeed,
it is straightforward to verify that (4.16) is the solution of
r kNr , nq k2 :
min (4:19)
The second useful simplication due to the special structure of the problem arises in the com-
putation of the product N T nq in (4.16). The resulting vector indeed contains in position i the
inner product of the i-th active constraint normal with the normal to the q -th constraint. As
both these constraints may be interpreted as islands, the question is then to compute the inner
product of the new island, corresponding to the q -th constraint, with all already active islands.
We then obtain the following simple result.
Lemma 4.1 The vector N T nq appearing in (4.16) is given componentwise by
h T i
N nq i = jIj+ \ Iq+j + jIj, \ Iq, j , jIj+ \ Iq,j , jIj, \ Iq+ j (4:20)
for i = 1; : : :; jAj and j equal to the index of the i-th active island.
Proof. Since h T i T
N nq i = ni nq (4:21)
It is useful to note that, because of (4.4) and (4.6),
8
>
< +1 if ak 2 I`,;
+
[n` ]k = > ,1 if ak 2 I` ; (4:22)
: 0 otherwise,
for k = 1; : : :; m and ` 2 A [ fq g. This equation holds for both types of islands (with or without
negative shore). Taking the inner product of two such vectors (for ` = j and ` = q ) then yields
(4.20). 2
As a consequence, the practical computation of r may be organized as follows:
1. compute the vector y 2 RjAj whose i-th component is given by (4.20),
2. perform a forward triangular substitution to solve the equation
RT z = y (4:23)
for the vector z 2 RjAj ,
3. perform a backward triangular substitution to solve the equation
Rr = z (4:24)
for the desired vector r.
This calculation will be a very important part of the total computational eort per iteration in
the algorithm.
solving the inverse shortest path problem 61
Hence, I + (i) (resp. I , (i)) is the set of active islands of V such that the arc ai belongs to its
positive (resp. negative) shore.
We nally dene the logical indicator function [] by
(
1 if condition is true,
[condition] = (4:29)
0 if condition is false.
We can now state our lemma.
Lemma 4.2 Consider a dual feasible solution for the problem of minimizing (4.5) subject to the
constraints given by an active set A = (V; Y ). Assume furthermore that, among the Lagrange
multipliers fuk gjkA=1j , those associated with the active islands of V are known. Then the weight
vector w corresponding to this dual solution is given by
2 3
X X
wi = [i 2 X [ Z ]ci + [i 2 X ] 4 uk , uk 5 (4:30)
k2I + (i) k2I , (i)
for i = 1; : : :; m.
Proof. We rst note that we can restrict our attention to the weights that are not at their
bounds (i 2 X [ Z ), because we know, by denition, that wi = 0 for i 2 Y . Every active island
in V thus corresponds to a constraint of the form
X X
wk , wk = 0: (4:31)
kjak 2I + ^k62Y kjak 2I ,^k62Y
The desired expression for wi (i 2 X [ Z ) immediately follows from the Lagrangian equation
@L(w; u) = 0; (4:32)
@wi
where the Lagrangian function for the problem is given by
L(w; u) = 1P ( w , c ) 2 , PjAj u hP + w , P , wi
i
2 i2X [Z i i k=1 k h ijai 2Ik ^i62Y i ijai2Iki^i62Y (4:33)
1P P P P
2 i2X [Z (wi , ci ) , i2X wi k2I + (i) uk , k2I , (i) uk ;
= 2
where we restrict the last major sum to the set X because all other terms are zero. 2
The lemma simply means that the i-th weight can be obtained from wi by adding to it all
Lagrange multipliers corresponding to active islands such that ai belongs to the positive shore
of the island and by substracting all the multipliers of active islands such that ai belongs to the
negative shore.
Consider now the computation of the primal step direction s and of the inner product sT nq .
Note rst that, when (3.41) is reached in the algorithm, the primal step direction s is nonzero
and nq is linearly independent from the columns of N . The value of sT nq is then given by the
following result.
solving the inverse shortest path problem 63
Lemma 4.3 Assume the GI algorithm is applied to the inverse shortest paths problem under
consideration, and that it has reached the point where equation (3.41) should be evaluated. Assume
furthermore that A = (V; Y ) is the active set at this stage of the calculation. Then the primal
step direction s is given componentwise by
2 3
X X
si = [ai 2 Iq+ ] , [ai 2 Iq, ] + [i 2 X ] 4 rk , rk 5 (4:34)
k2I ,(i) k2I (i)
+
for i = 1; : : :; m. As a consequence,
X X
sT nq = 1 + rk , rk (4:35)
k2I ,(q) k2I + (q)
in the case where the q -th constraint is the lower bound on the q -th weight, and
2 3 2 3
X 41 + X X X X X
sT nq = rk , rk 5 + 41 + rk , rk 5 (4:36)
ijai 2Iq+ k2I , (i) k2I + (i) ijai 2Iq, k2I + (i) k2I , (i)
in the case where the q -th constraint is a violating island.
Proof. We rst note that s, the change in the weight w corresponding to a unit step in the
dual step direction, can be viewed as the sum of two dierent terms s = nq , Nr. The rst term
corresponds to the incorporation of the q -th constraint in the active set and its contribution to
si is +1 if ai belongs to the positive shore of the q -th island, and is -1 if ai belongs to its negative
shore. This is because the (jAj + 1)-th component of the dual step direction, corresponding to
the q -th constraint, is equal to +1. Hence we have that this rst contribution is equal to
[ai 2 Iq+] , [ai 2 Iq, ] (4:37)
for the i-th arc. Note that only one of the indicator functions can be nonzero in (4.37). The
second contribution corresponds to the modications to wi caused by the fact that ai may also
belong to islands that are already active. In other words, the nonzero components of ,r have to
be taken into account. The equation (4.30) then implies that this second contribution from the
Lagrange multipliers associated with all constraints already in the active set must be equal to
2 3
X X
[i 2 X ] 4 rk , rk 5 : (4:38)
k2I , (i) k2I + (i)
Summing the contributions (4.37) and (4.38) gives (4.34).
Assume now that the q -th constraint is a lower bound. In this case, one has that nq = eq , the
q -th vector of the canonical basis in Rm. Hence the product sT nq is equal to sq . Equation (4.30),
the nonnegativity of the fwigmi=1 and the fact that wq < 0 imply that q 2 X , and (4.35) then
follows from (4.34). On the other hand, if the q -th constraint is a violating island, the normal nq
is then given componentwise by (4.22) with ` = q . Hence we obtain (4.36) from (4.34). 2
solving the inverse shortest path problem 64
7a: Compute the steplength t as in (3.43), set c c + ts, revise f according to (3.45) and
u using 8 !
>
< u + t ,r if > 0;
u > 1 (4:46)
: u+t if = 0:
If t = tc , set A A [ fq g, + 1 and go to Step 1.
Otherwise (that is if t = tf ) set A A n f`g, , 1 and go to Step 3b.
7b: If tf = +1, then the problem is infeasible, and the algorithm stops with a suitable
message.
Otherwise, update the Lagrange multipliers according to (3.42). Set A A n f`g,
, 1 and go to Step 3b.
Note that, in our current implementation of the algorithm's second step, we choose the current
violated island as that whose excess is most negative. This technique appears to be quite ecient
in practice.
4.2.9 Note
Similar implementation techniques have been used by Calamai and Conn for solving location
problems with a related structure (see [18, 19, 20]). Their technique is however dierent from
ours and a comparison of both approaches will be examined in future work.
path tree rooted at the origin of the path (dening the constraint), proceeding backward from the
destination to the origin of that path, since shortest path trees computations give the predecessor
of each vertex in the trees. On the other hand, the sparsity of R has been taken into account by
means of linked lists. The following operations on R then needed to be specialized: adding and
deleting a column, and performing Givens plane rotations to restore the upper triangular form.
Finally, storing a graph in a computer's memory naturally involves the representation of the
arcs (our variables) by their terminal vertices. Thus care must be exercised to handle vertex vs.
arc representations for the graph, the constraints, and in particular nonoriented arcs.
We summarize the results of the tests in Table 4.2, where the following symbols are used:
iter. : the number of major iterations of the algorithm, that is, the number of full steps in the
primal space (adding a constraint in the active set and requiring the calculation of the
shortest paths and the choice of a new violated constraint)
drops : the number of islands dropped at Step 7 of the algorithm, that is, the number of minor
iterations (partial and dual steps, involving only the computation of the step directions in
the primal and dual space)
jAj : the number of active islands at the solution.
We note that the rst of these numbers is always one larger than the sum of the two others,
because one iteration is required for considering the empty active set.
The following gure illustrates results obtained by applying the inverse shortest path algo-
rithm to a set of problems presented in Chapter 5 (Table 5.1, page 91). The left-hand histogram
solving the inverse shortest path problem 68
shows the total number of iterations partitioned into drops and major iterations. The right-hand
graphic shows the time spent in calculating shortest paths with respect to the overall algorithm
run-time.
140
sh. paths time / overall time 100%
120
80%
100
Iterations
80 60%
60
40%
40
20 20%
0
0%
24 84 220 612 24 84
1300 3280 220 612 1300 3280
m m
Drops Major Iterations
Figure 4.2: Iterations per problem size and shortest paths calculation
Despite the limited character of these experiments, one can nevertheless observe the points
that follow.
The algorithm is relatively ecient in the sense that it does not, at least in our examples,
add many constraints that are not active at the solution, with the necessity to drop them
at a later stage. See this feature in Figure 4.2.
One also observes in practice that a fairly substantial part of the total computational eort
is spent in calculating the necessary shortest paths in order to detect constraint violation
(Figure 4.2).
Choosing a set of constraint paths from a single tree induces signicant savings in the
solving the inverse shortest path problem 69
determination of the most violated constraint, because only one shortest path tree is needed.
There are at most mn inequalities of type (4.47), n equalities of type (4.48) and nP
2
n inequalities of type (4.49). Hence the total number of constraints in this formulation is
polynomial. As a consequence, the problem is solvable in polynomial time by an interior point
algorithm.
This interesting observation is clearly of theoretical importance, but the inclusion of n2 addi-
tional variables could generate ineciencies in practical implementations.
5
In many applications, modelling networks accurately requires dependences between arc weights.
See, for instance, seismic waves propagating through the earth crust: these waves have similar
velocities as they propagate through media made of similar densities. The motivation for this
research also comes from applications in trac modelling. This chapter considers the inverse
shortest path problem where arc weights are subject to correlation constraints. A new method is
proposed for solving this class of problems. It is constructed as a generalization of the algorithm
presented in Chapter 4 for uncorrelated inverse shortest paths. In the uncorrelated case, the
variables were the arc weights and there was no correlation between them. Now, we partition the
arcs into cells or classes. The weights of the arcs located in the same cell are derived from the same
value called \cell density". The variables of our new problem become these cell densities. The
advantage of such a partition is that the number of variables decreases substantially. Moreover,
the renement of each cell may increase without aecting the number of our new variables. On
the other hand, the correlations involve more restrictions, and hence more constraints. Note that
shortest path constraints are not expressed with our new variables, but still involve arc weights.
As a consequence, the concept of island, introduced to formalize the violation of shortest path
constraints, has to be revised in this new context. This chapter will establish the results allowing
our new algorithm to handle such constraints in the space of the cell densities, including implicit
lower bounds constraints on shortest path costs. Preliminary numerical experience with the
new method is presented and discussed. In particular, we propose a computational comparison
between the uncorrelated method (that of Chapter 4) and the correlated one. We also provide
results obtained by using two possible strategies for handling constraints: the rst considers the
rst violated constraint as candidate to enter an active set of constraints, and the second strategy
privileges the most violated constraint.
The matter of this chapter is to be published in [16].
5.1 Motivation
The technique proposed in Chapter 4 for solving an inverse shortest path problem is based on the
solution of a particular instance of the problem's description, which is the problem of recovering
70
handling correlations between arc weights 71
arc weights in a weighted oriented graph, given a (usually incomplete) set of shortest paths in
this graph. In this approach, the arc weights are assumed to be independent from each other.
This last assumption, although reasonable in some applications, is not fullled in all cases
of interest. Even in the areas mentioned above (transportation and tomography), interesting
questions can be asked where the independence assumption is clearly violated. It is the purpose
of the present chapter to propose an algorithmic approach to overcome this limitation.
We rst illustrate the need for such an extension by an example drawn from transportation
research. This example is presented in detail and subsequently used to motivate the specic
concepts to be introduced. An additional case of interest in computerized tomography is also
mentioned.
3 4 13 14
7 8 17 18
2 6 9 16 19 25
1 5 10 15 20 26
12 11 21
24
23 22
27 28
29
30 43 44
38 36 31 42 45 47
37 35 32 41 46 48
34 33
40 39
Figure 5.1: The rst example involving correlations between arc weights
delay for a turn depends on the global delay (the red light period, for instance) of the relevant
junction. We may trivially extend this denition to
wi =
i d`(i); (5:2)
where we have dened
i def= 1 and `(i) def
= i for all arcs not belonging to any junction.
We then face the problem of estimating the delays d`(i) subject to the constraint that a set
of a priori known paths in the graph must be the shortest ones between their origin and their
destination. As in Chapter 4, this problem is usually underdetermined and a particular solution
can be chosen that minimizes the dierence between the computed delays and some a priori
known values. We then have an inverse shortest path problem as dened in Chapter 4 whose
variables are the delays (as opposed to the weights).
divided into 2 3 cells, we may then choose a simple cell model consisting of 6 arcs (a square with
j j j j
both diagonals), and then construct the resulting (undirected) network illustrated in Figure 5.2.
1
1 2 3 4
@ 6 , @ , @ ,
j j j j
5@ , @ , @ ,
4 ,
@ 2 ,
@ ,
@
, @ , @ , @
, 3 @ , @ , @
5 6 7 8
@ , @ , @ ,
j j j j
@ , @ , @ ,
,
@ ,
@ ,
@
, @ , @ , @
, @ , @ , @
9 10 11 12
We consider the following simple model to describe the travel time wi of a compression wave
along the i-th arc within the `(i)-th cell:
wi =
i d`(i); (5:3)
where
i is now proportional to the length of the i-th arc1 . In our example, the travel times
associated with the arcs of the rst cell in Figure 5.2 (whose sides are assumed to be of unit
length) are given by
wi = dp1 for i = 1; : : :; 4;
(5:4)
wi = 2 d1 for i = 5; 6:
As above, we now consider the question of estimating the cell densities d`(i) from the knowledge
of the wave paths and arrival times. Because of the Fresnel law stating that waves follow shortest
paths in their propagation medium, this is again a variant of the inverse shortest path problem,
where the variables are no longer the weights associated with the arcs, but some more aggregated
quantities (the cell densities) which determine these weights via linear relations.
(V ; A; w), where (V ; A) is an oriented graph with n vertices and m arcs, and where w a set of
nonnegative weights fwi gmi=1 associated with the arcs. Let V be the set of vertices of the graph
and A = fak = (s(k); t(k))gmk=1 be the set of arcs, where s(k) denotes the vertex at origin of the
arc ak (its \source-vertex") and t(k) the vertex at its end (its \target-vertex"). Also assume that
the set of arcs A is partitioned in L disjoint classes and that a nonnegative density is associated
with each of these classes. Assume nally that the weight of every arc can be computed as an
arc-dependent proportion of the density of the class to which the arc belongs, that is
wi =
i d`(i) for i = 1; : : :; m; (5:5)
where `(i) denotes the index of the (unique) class containing the i-th arc. We say that the i-th
arc is associated with the `(i)-th class.
In our rst example, the arcs are the detailed links of the network, including the detailed links
within a junction. They are partitioned into classes corresponding to roads and junctions: the
densities of these classes then correspond to the delays along roads and the trac light cycles at
the junctions. In our second example, the classes correspond to cells of the discretized geological
medium, the densities to their actual physical densities and the arcs to the possible ways in which
a wave can travel across a cell.
Our problem is then to determine values of the class densities that are compatible with a set
of known properties of the weighted graph.
In our rst example, we may assume that the network users follow the path that they perceive
to be shortest between their origin and destination. An observation of the paths actually chosen
by these users then gives constraints of the type just described. For instance, we may know that
users travelling from vertex 1 to vertex 38 use the path dened by
P1 = (1; 5; 10; 15; 24; 29; 36; 38) (5:10)
while those travelling from vertex 1 to 48 use that given by
P2 = (1; 5; 10; 15; 22; 43; 46; 48): (5:11)
We also wish to consider constraints that impose a lower bound on the cost of the shortest
path between two vertices. These constraints were introduced in Chapter 1 and have not been
considered in the basic inverse shortest path method proposed in Chapter 4. We saw in Chapter 3,
Section 3.2, that such a constraint can be expressed by a set of constraints imposing that the
weight of all paths between the two vertices, no and nd say, is bounded below by a constant, that
is X
wk (no;nd) ; (5:12)
kjak 2g
where g is any path with origin no and destination nd .
In the context of our example, we may know that the time required to reach vertex 42 from
vertex 13 is clearly not smaller than 50 measure units. In this case, we wish to impose that the
weight of the shortest path between these vertices is bounded below by 50.
The number of linear constraints of the form (5.8) and (5.12) is dependent on the number
of possible paths between two vertices in the graph, which grows exponentially with the density
of the graph m=n. Enumerating these constraints is of course out of question, and we will have
to use a \separation procedure" to determine which of these constraints are violated for a given
value of the class densities. This separation procedure is based on the computation of the shortest
paths within the graph, given the weights on its arcs, which are themselves determined by the
cell densities and (5.5).
We could also consider imposing upper bounds (and therefore equalities) on the weight of
some shortest paths. We showed in Chapter 3, Section 3.2 that this type of constraint can no
longer be expressed as a set of linear inequalities, as in (5.8) and (5.12). The problem is therefore
of a dierent nature. This special case will be considered in Chapter 6.
and/or
X
L
il dl = i (i 2 E ); (5:14)
l=1
where the il are general coecients, the i are specied constants and the sets I and E index
the inequality and equality constraints respectively.
For instance, the network users of the rst example may be aware that no trac light cycle
exceeds 5 minutes, therefore imposing an explicit upper bound on all class densities representing
such cycles. Other a priori knowledge of the network also might indicate that a given cycle is
longer than another one: this again produces a linear constraint of the type (5.13) on the relevant
class densities.
Observe that linear constraints on the arc weights can be expressed in the form of (5.13) or
(5.14) provided they involve xed sets of arcs. The translation from arc weights to class densities
is then given by (5.5).
We note that statistical correlation between weights could be handled by considering the
objective
1 T
w 2 (w , w) C (w , w )
min (5:18)
where C ,1 is a covariance matrix on w and where the superscript T denotes the transpose (see
[105], for instance). There are two main reasons why we will not follow this approach.
1. The formulation (5.18) clearly allows for statistical correlation between the densities, but
does not garantee that the equalities (5.5) do hold.
2. Introducing a non diagonal C as the Hessian of the objective function substantially com-
plicates the algorithm, as will become clear in our later developments.
Methods based on (5.18) therefore constitute an alternative to those presented in this chapter
and deserve a separate study.
N def
= (N T N ),1 N T ; (5:20)
and
H def
= (I , NN ); (5:21)
is then the reduced inverse Hessian of the quadratic objective function in the subspace of weights
satisfying the active constraints.
When a constraint (5.8) is violated or active, that is when a path pj has its weight greater than
or equal to that of another path, g say, with the same origin and destination, pj and g determine
together at least one island whose positive shore consists of the part of g that is not common with
pj and whose negative shore consists of the part of pj that is not common to g (see Figure (5.3)).
Of course a violated constraint may generate more than one such island (the paths pj and g may
indeed rst depart from each other, then join and depart again later), but each island necessarily
handling correlations between arc weights 78
s s s s
g
s s
I ,
pj
ss s
pj
I ,
,
, g
s
, - I+
,
,
,
pj ,
g
must correspond to a violated constraint that is implicit in the statement (5.8) (a subpath of a
shortest path is also shortest). We make the choice to consider each such constraint explicitly
and therefore to associate one and only one island with each violated constraint. The algorithm
only considers a subset of all possible islands and assigns an index, q say, to each one of them.
For each such island, the sets Iq+ and Iq, are dened to be the sets containing the arcs of its
positive and negative shores respectively, while Iq def
= Iq+ [ Iq, . The excess of the island, denoted
Eq(w) (or, more brie
y, Eq ) is then given by
X X
Eq (w) def
= wi , wi : (5:22)
ai 2Iq+ ai 2Iq,
We again illustrate these concepts within our rst example. If we assume that the weight of
the path p1 is greater than that of g = (1; 5; 12; 27; 36; 38), the constraint that p1 is shortest is
violated, and these paths determine an island, I1 say, whose positive shore I1+ contains the arcs
joining vertices 5, 12, 27 and 36, while its negative shore I1, contains the arcs joining vertices 5,
10, 15, 24, 29 and 36.
We note that both shores of an island start at the same vertex and end at the same vertex.
Remember that the inverse shortest path algorithm produces a set of dual feasible points and
keeps a set of active constraints A, where each constraint is veried as an equality. The algorithm
then proceeds by successively selecting the island whose excess is most negative and by adding it
to its \active set". This is achieved by increasing the weights of the arcs on the positive shore and
reducing the weights on the negative shore until both shores are of equal weight. The process is
continued until no violated constraint is left. In Chapter 4, the algorithm also explicitly handles
the fact that the weights must remain nonnegative. This creates additional constraints that can
also become active in the course of the calculation. When such a bound constraint is violated, we
handling correlations between arc weights 79
consider that it has a positive shore (containing the arc whose weight is negative) and an empty
negative shore. That is why we partitioned the active set A into two subsets
A = (V; Y ); (5:23)
where V is the set of currently active islands with a nonempty negative shore and Y the set of
active islands with empty negative shore (the set of active bounds):
Y def
= fi 2 f1; : : :; mg j wi = 0g: (5:24)
Let us recall the denition of several sets that will be generalized in the next section, in order to
work in the space of class densities:
X def
= fi 2 f1; : : :; mg n Y j 9j 2 V : ai 2 Ij g (5:25)
and
Z def
= f1; : : :; mg n (X [ Y ) : (5:26)
X contains the indices of the arcs that appear in one of the active islands of V but are not xed
at their lower bounds, while Z contains the indices of the arcs that are not involved at all in the
active constraints of A. For i 2 X , we had dened
I + (i) def
= fj 2 V j ai 2 Ij+ g and I , (i) def
= fj 2 V j ai 2 Ij, g; (5:27)
which is the set of active islands of V whose positive (resp. negative) shore contains the arc ai .
r kNr , nq k2 :
min (5:35)
The second useful simplication due to the special structure of the problem arises in the com-
putation of the product N T nq in (5.32). The resulting vector indeed contains in position i the
inner product of the i-th active constraint normal with the normal to the q -th constraint. As
both these constraints may now be interpreted as islands or dependent sets, we may exploit this
similarity in expressing the value of N T nq .
In order to state this expression in a reasonably compact form, we dene some additional
notations:
(l) is the set of the arcs located in the class cl, i.e.
(l) def
= fak j `(k) = lg : (5:36)
,l (Ii+) and ,l(Ii,) are the \
-weighted" cardinalities of the positive and negative shores of
Ii restricted to the arcs of
(l), that is
X
,l (Ii ) def
=
k [ak 2 Ii ] (5:37)
ak 2
(l)
where the
k are the proportions dened by the equation (5.5). Similarly, we also dene
,l (Di+ ) and ,l (Di, ) by
,l (Di ) def
= il [cl 2 Di ]: (5:38)
Finally, we use the symbol Ji to represent either Ii, if the i-th constraint is a proper island,
or Di if the i-th constraint is of the type (5.13) or (5.14). By convention, we set Di = ;
when Ji = Ii and Ii = ; when Ji = Di . ,l (Ji+ ) and ,l (Ji, ) are then given, according to
(5.37) and (5.38), by
(
,l (Ii ) if J is an island,
,l (J ) def
= (5:39)
i ,l (Di ) if J is a dependent set.
We can now express the inner product of the normal to the q -th active constraint with the
normals of all other active ones.
handling correlations between arc weights 82
The formulation (5.46) is therefore valid for both constraint types because the negative shore Ik,
is empty for a lower bound constraint.
Since h T i T
N nq i = ni nq (5:48)
and g is equal to the index of the i-th active island, (5.40) holds for the island constraints when
observing that the class indices involved (l) are those for which there exists at least one arc
belonging both to the class and to a shore of the island Iq or Ig , that is, we can restrict l to the
set B(i).
The proof is totally similar when considering the dependent set constraints (5.13) and (5.14),
using (5.29), (5.30), (5.31), and (5.38).
For future reference, we note that, in the general case, [nk ]l can be written as
[nk ]l = ,l (Jk+ ) , ,l (Jk, ) ; for l = 1; : : :; L; (5:49)
using (5.39). 2
As a consequence of Lemma 5.1, the practical computation of r in (5.32) may be organized as
follows:
1. compute the vector y 2 RjAj whose i-th component is given by (5.40),
2. perform a forward triangular substitution to solve the equation RT z = y for the vector
z 2 RjAj ,
3. perform a backward triangular substitution to solve the equation Rr = z for the desired
vector r.
Y def
= fl 2 f1; : : :; Lg j dl is at a bound lg; (5:54)
with l being either the lower bound or the upper bound value at which the l-th class density is
currently xed. The value of l equals that of i =il, where i 2 VD is the index of the related
bound constraint. Note that Y0 Y .
To characterize the classes involved in active islands or dependent sets, we dene the set of
class indices appearing in active dependent sets
XD def
= fl 2 f1; : : :; Lg n Y j 9j 2 VD : cl 2 Dj g; (5:55)
the set of class indices with which arcs in active islands are associated
XI def
= fl 2 f1; : : :; Lg n Y j 9j 2 VI : 9ak 2
(l) \ Ij g; (5:56)
and
X def
= XD [ XI (5:57)
Note that XD and XI are not necessarily disjoint.
The set X thus contains the indices of the classes that are involved in one of the active islands
or dependent sets of V but are not xed at a bound.
The remaining class indices are the elements of the set Z ,
Z def
= f1; : : :; Lg n (X [ Y ) : (5:58)
The set Z contains the indices of the class densities that are not involved at all in the active
constraints of A.
When we consider a class index l in X , denitions analogous to (5.27) can be made:
for l 2 XD , we dene the sets D+(l) and D, (l) as
D (l) def
= fj 2 VD j cl 2 Dj g; (5:59)
that is the set of active dependent sets whose positive (negative) shore involves the l-th
class.
By convention, we then set
I (l) def
= ; if l 62 XI : (5:60)
Similarly, if l 2 XI , we dene the sets I +(l) and I ,(l) as
I (l) def
= fj 2 VI j 9ak 2
(l) \ Ij g; (5:61)
i.e. the set of active islands such that their positive (negative) shore involves an arc asso-
ciated with the l-th class.
We then set
D (l) def
= ; if l 62 XD : (5:62)
handling correlations between arc weights 85
The set F + (l) (resp. F , (l)) then contains active constraints that are not bound constraints and
that involve the class cl in their positive (resp. negative) shore.
Lemma 5.2 Consider a dual feasible solution for the problem of minimizing (5.17) subject to
the constraints in the active set A = (V; Y ). Assume furthermore that, among the Lagrange
multipliers fuk gjkA=1j , those associated with the active islands and dependent sets of V are known.
Then the class density vector d corresponding to this dual solution is given by
2 3
X X
dl = [l 2 Y n Y0 ] l + [l 2 X [ Z ] dl + [l 2 X ] 4 ,l (Jg+ )ug , ,l (Jg, )ug 5 (5:64)
g2F + (l) g2F ,(l)
for l = 1; : : :; L.
Proof. Dene the following sets:
the set of active islands related to lower bound constraints on a shortest path, i.e. constraints
of type (5.12)
BI def
= fq 2 VI j Iq, = ;g; (5:65)
the set of classes that are involved in the active constraints of type (5.12)
XBI def
= fl 2 f1; : : :; Lg n Y j 9j 2 BI : 9ak 2
(l) \ Ij g; (5:66)
and the set of classes involved in the active constraints of type (5.8)2:
XBIc def
= fl 2 f1; : : :; Lg n Y j 9j 2 VI n BI : 9ak 2
(l) \ Ij g: (5:67)
and SI is the term involving the active lower bound constraints on a shortest path,
20 1 3
X X
SI = ug 4@ ,l (Ig+ ) dl A , g 5 ; (5:71)
g2BI l2f1;:::LgnY0
where g is the lower bound (no ;nd ) dened in (5.12) and related to the g -th constraint.
Since the class densities indexed by Y are xed at their bound, (5.68) becomes
X X
L(d; u) = 12 (dl , dl)2 + 12 (l , dl)2 , SD , SIc , SI : (5:72)
l2X [Z l2Y nY 0
When observing that, because of (5.38), gl = (,l (Dg+ ) , ,l (Dg, )), we can rewrite SD using (5.59)
and permuting the two sums in (5.69):
2 3
X X X X
SD = dl 4 ,l (Dg+ ) ug , ,l (Dg, ) ug 5 , g ug : (5:73)
l2XD [(Y nY0 ) g2D+ (l) g2D, (l) g2VD
Similarly, using (5.37) and (5.61), we can modify SIc and SI to obtain:
2 3
X X X
SIc = dl 4 ,l (Ig+ ) ug , ,l (Ig, ) ug 5 (5:74)
l2XBIc g2I (l)nBI
+ g2I ,(l)nBI
and 2 3
X X X
SI = dl 4 ,l (Ig+) ug 5 , g ug : (5:75)
l2XBI g2I + (l)\BI g2BI
Expressing now the condition @ L@d(d;u )
l = 0 (for l 2 X [ Z ) and combining the terms from (5.74)
and (5.75) yields, for l 2 X [ Z , that
dl = [l 2 X [ Z ] dl P
+ [l 2 XD [ (Y n Y0 )] , ( D + ) u , P , , (D,)u (5:76)
g2D (l) l g g g2D (l) l g g
P P
+
+ [l 2 XI ] ,
g2I (l) ,l (Ig )ug , g2I , (l) ,l (Ig )ug :
+
+
Finally, since dl = l for l 2 Y n Y0 , we obtain the desired expression of dl using (5.57), (5.60),
(5.62) and (5.63) in (5.76). 2
Of course, the multipliers uq (q 2 E ) in (5.64) are not constrained to be nonnegative.
Lemma 5.3 Assume that the inverse shortest path algorithm has reached the point where the
primal step direction is to be computed, and assume that A = (V; Y ) is the active set at this stage
of the calculation. Then, when non-zero, the step direction in the primal space s is given by
2 3
X X
sl = [l 2 X ] 4 [nq ]l + ,l (Jk, )rk , ,l (Jk+ )rk 5 (5:77)
k2F , (l) k2F + (l)
for l = 1; : : :; L. Moreover, if q is the index of a violated island or a violated dependent set that
is not a bound on a class density, we have that
2 3
X X X
sT nq = [nq ]i2 + [nq ]i 4 ,i (Jk, )rk , ,i (Jk+ )rk 5 (5:78)
i2D k2F , (i) k2F + (i)
where ( )
D = l 2 X j or 9c a2k 2D
(l) : ak 2 Iq (5:79)
l q
If q is the index of a bound constraint on the lq -th class density, we then have that
for a lower bound,
X X
sT nq = ,lq (Jq+ ) + ,lq (Jk, )rk , ,lq (Jk+ )rk ; (5:80)
k2F , (lq ) k2F + (lq )
Proof. From the denition of H in (5.21) and that of r = N nq , the primal step direction
s(= Hnq ) may be rewritten as nq , Nr. Then, using (5.49), the l-th component of s can be
expressed as:
jAj
X
sl = ,l (Jq+ ) , ,l (Jq, ) , ,l (Jk+ ) , ,l (Jk, ) rk : (5:82)
k=1
Eliminating the null terms and using (5.63), we obtain that
X X
sl = ,l (Jq+ ) , ,l (Jq, ) + ,l (Jk, )rk , ,l (Jk+ )rk : (5:83)
k2F , (l) k2F + (l)
Moreover, if l 62 X , sl must be zero since a class density at a bound cannot change as long as the
bound constraint is active. We thus obtain (5.77) by using (5.46).
Assume now that the q -th constraint is a lower (resp. upper) bound on a class density (the
lq -th one, say). Then, nq = eq (resp. ,eq ), the q-th vector of the canonical basis. Hence the
product sT nq is equal to sq (resp. ,sq ) and (5.80) (resp. (5.81)) follows from (5.77) and the fact
that q 2 X , since dq violates a bound.
handling correlations between arc weights 88
On the other hand, if the q -th constraint is a violated island or a violated dependent set that
is not a bound constraint,
XL
sT nq = si ,i (Jq+ ) , ,i (Jq, ) (5:84)
i=1
where si has been established in (5.77). Note that both ,i (Jq+ ) and ,i (Jq, ) may be nonzero,
at variance with what happens in the uncorrelated inverse shortest paths problem in Chapter 4.
Finally, we obtain (5.78) from (5.84) by eliminating terms whose contribution is zero. 2
In our algorithmic framework, the computation of the new values of the primal variables may
be completely deferred after that of the dual step, in contrast with the original method of Golfarb
and Idnani.
Note that in the second step of our current implementation of the algorithm, we do not specify
how to select a violated constraint. We examine two possibilities in the next section.
Also remark that the calculation of tf ensures that equality constraints can never leave the
active set.
the number of known shortest paths dening the constraints of the problem. The graph of test
problem #2 is illustrated in Figure 1; it is in fact extracted from a larger graph covering a whole
city in a realistic application.
# L n m nE
1 9 16 24 12
2 36 49 84 24
3 100 121 220 56
4 289 324 612 144
5 625 676 1300 312
6 1600 1681 3280 650
Table 5.1: Test problems involving class densities
The results of the test runs are reported in Table 5.2. The columns labeled \CORRELATED"
and \UNCORRELATED" refer to the correlated method and the uncorrelated method respec-
tively. As already mentioned above, the number of variables is much smaller when solving a
correlated problem with the former method than with the latter. The label \var" indicates this
number of variables in each case. jAj is the number of active constraints at the solution. In
the third column (for each method), one can nd the number of dropped constraint(s), that is
the number of minor iterations. The reader can then deduce the number of (major) iterations
that were required to solve the problem as being the sum of the number of minor iterations and
jAj + 1. Finally, the heading \time" refers to the total cpu-time (in seconds) needed to obtain
the solution and \sp time" is the time (in seconds) spent in calculating shortest paths trees.
# CORRELATED UNCORRELATED
var jAj drop(s) time sp time var jAj drop(s) time sp time
1 9 3 0 0.230 0.011 24 3 0 0.851 0.011
2 36 4 0 0.433 0.144 84 4 0 0.925 0.066
3 100 4 0 0.894 0.425 220 5 0 1.269 0.367
4 289 13 0 9.750 4.703 612 16 1 7.285 5.175
5 625 41 1 130.582 49.304 1300 31 2 41.078 34.046
6 1600 89 3 1958.714 473.363 3280 125 1 569.003 461.949
Table 5.2: Comparative test results for the correlated and uncorrelated algorithms
The following gure shows the results obtained in Table 5.2 with the correlated algorithm.
The left-hand histogram illustrates the total number of iterations partitioned into drops and
major iterations. The right-hand graphic brings forward the time spent in calculating shortest
paths with respect to the overall algorithm run-time. The corresponding results obtained via the
uncorrelated algorithm are represented in Figure 4.2 of Chapter 4.
We rst notice that the correlated method runs faster on smaller problems (#1 { #3), while
it is not the case on larger ones (#4 { #6). But, of course, the arc weights produced by the
handling correlations between arc weights 92
100
100%
80%
60
60%
40
40%
20
20%
0
0%
9 36 100 289 625 9 36
1600 100 289
m 625 1600
m
Drops Major Iterations
Figure 5.4: The correlated algorithm: iterations per problem size and shortest paths calculation
uncorrelated method lack the necessary correlation between their values, although these values do
generally correspond in their order of magnitude. The usefulness of these weights can therefore be
questioned for practical application. The new method, on the other hand, produces the desired
correlations, as expected.
The shortest paths tree calculation requires roughly the same amount of time for both meth-
ods, even when the problem's dimension increases; this time logically tends to vary in parallel
with the number of major iterations.
In order to explain the cpu-time dierences obtained for the tests #4 { #6 (the correlated
method taking much more time, despite the shortest paths computations being comparable), we
evaluated the time spent in each procedure of both methods using the UNIX proler. It turns
out that 40% of the time is used for the shortest paths calculation in the correlated method,
while the proportion increases to 80% for the uncorrelated method. The additional computation
in the new method corresponds to calculating the values of ,l (Jk ), and can take up to 50% of
the total execution time. This calculation relates the smaller problem (in term of class densities)
to the larger one (in term of arc weights).
strategy might prove cheaper in computation time because the procedure of selecting the next
constraint to add to the active set is considerably simpler.
We tested these variants on problems where the number of equality constraints is half the
total number constraints. Characteristics of 6 test problems are given in Table 5.3.
# L n m nE
7 9 16 42 12
8 36 49 156 24
9 64 81 144 24
10 225 256 930 60
11 256 289 544 70
12 441 484 924 140
Table 5.3: Test problems with equality constraints
Again, we propose in Figure 5.5 illustrations of the results obtained by these variants of
the correlated algorithm. The contents of left-hand and right-hand graphics are the same as
those explained above for Figure 5.4. Upper illustrations apply for the algorithm choosing the
rst violated constraint, and lower graphics illustrate the algorithm choosing the most violated
constraint as candidate to enter the active set.
We note that variant MV gives the least number of drops and also the smallest computing
time for the larger problems. Variant FV only seems interesting for smaller cases. We also tried
to handle the equality constraints just as the other ones without giving them priority to enter
the active set. The obtained results are sometimes better than the worst of MV and FV, but
never better than the best.
Of course, more experience is required before drawing extensive and denitive conclusions.
But we feel that the reported tests already illustrate some major trends in the use of inverse
shortest path algorithms.
handling correlations between arc weights 94
500
100%
80%
300
60%
200
40%
100
20%
0
0%
9 36 64 225 256 9 36
441 64 225
m 256 441
m
Drops Major Iterations
500
100%
sh. paths time / overall time
400
Iterations
80%
300
60%
200
40%
100
20%
0
0%
9 36 64 225 256 9 36
441 64 225
m 256 441
m
Drops Major Iterations
Figure 5.5: Algorithm variants: iterations per problem size and shortest paths calculation
6
In this chapter, we examine the computational complexity of the inverse shortest paths problem
with upper bounds on shortest path costs. The presence of upper bounds on shortest path costs
makes the inverse shortest path problem harder to solve. Indeed, such an upper bound constraint
restricts the cost of one path that is not known explicitly, and therefore cannot be expressed as
one or more linear constraints. Actually, our problem then can become non-convex. Solving this
problem has interesting implications, namely in seismic tomography where ray paths between
known locations are usually not observable and hence unknown. We will prove that obtaining a
globally optimum solution to this problem is NP-complete: we show that a polynomial trans-
formation of the well-known 3SAT problem can be viewed as an inverse shortest path problem
with upper bounds on shortest path costs. An algorithm for nding a locally optimum solution is
then proposed and discussed. The local optimality conditions allow to dene a \stability region"
around a (local) solution when the shortest paths (dened at that solution) are unique. A combi-
natorial strategy is set up when the shortest paths are not unique. This is necessary to obtain a
solution for which a stability region can be dened again. We will see that the stability region of a
local solution depends on the second shortest path costs: the idea is to dene a region in which the
explicit denition of some shortest paths does not change. Our algorithm (using an enumeration
strategy) has been implemented and tested on problems arising in practical applications. These
tests bring forward the fact that very few problems (among those not generated at random) need
the recourse to our combinatorial strategy. They also illustrate that the combinatorial aspect of
our problem may appear in practice, although the shortest path uniqueness is usually expected
(in double precision arithmetic).
The content of this chapter is reported in [17].
been considered so far. It is the purpose of this chapter to examine the more dicult case where
upper bounds are present as well.
We now motivate this development with two examples.
The rst arises from seismic tomography. In this eld, one is concerned with recovering ground
layers densities from the observations of seismic waves [79]. According to Fermat's principle, these
waves propagate along rays that follow the shortest path in time across the earth crust. One
can then measure, usually with some error, the propagation time of these rays between a known
source and a known receiver. The problem is then to reconstruct the ground densities from these
observations. One approach [87] uses a discretisation of the propagation medium into a network
whose arcs have weights inversely proportional to the local density. In this framework, one is then
faced with the problem of recovering these arc weights from the knowledge of intervals on seismic
rays travel times and from a priori geological knowledge, the ray paths themselves remaining
unknown. This is an inverse shortest paths problem with bound on the paths' weights.
The second example is drawn from trac modelling. In this research area, graph theory is
used to create a simplied view of a road network. An elementary (and often justied) behavioural
assumption is that network users choose perceived shortest routes for their journeys [92, 13, 84].
Although these routes might be observable, their precise description might vary across time and
individuals, and their travel cost is usually subject to some estimation. This naturally provides
bounds on the total time spent on shortest paths whose denition is unavailable. Recovering the
perceived arc costs is an important step in the analysis of network users' behaviour. This is again
a problem of the type considered in this chapter.
In the next section, we formalize the problem and explain why this bound constraints on
shortest paths costs cannot t into the framework of classical convex quadratic programs, as
used in Chapter 4 and 5.
where p1q (w) is the shortest path (with respect to the weights w) starting at vertex oq and arriving
at the vertex dq 1 . The values of uq are upper bounds on the cost of the shortest path from oq to
dq. We allow uq to be innite. Notethat the shortest path p1q (w) is not necessarily unique for a
given w. This will have important implications later in this chapter.
The method proposed in Chapter 4 is based on the quadratic programming algorithm due to
Goldfarb and Idnani [55]. The idea is to compute a sequence of optimal solutions to the problem
involving only a subset of the constraints present in the original problem. The method therefore
maintains an active set of constraints. Starting from the unconstrained solution, each iteration
incorporates a new constraint in the active set, completing what we call a major iteration. To
achieve this goal, it may be necessary to drop a constraint from the the active set. These drops
occur in minor iterations.
Incorporating upper bound constraints of the type (6.3) in the active set is complex. The
diculty is that expression (6.3) only denes the path p1q (w) implicitly, while adding a constraint
to the active set (as a linear inequality on arc weights) requires an explicit denition of that
constraint of the form X
[w]a uq (6:4)
a2p
where one needs the explicit denition of the path p as a succession of arcs to specify which arcs
appear in the summation. When such a constraint is activated, one naturally chooses a path
which is currently shortest given the value of the arc weights [w]a. However, as these weights
are modied in the course of the optimization, a path that is shortest between a given origin
and destination may vary, and therefore the explicit denition of the constraint in the form (6.4)
should also vary accordingly. An immediate consequence of this observation is that, besides
adding and dropping constraints of the type (6.4) from the active set, one should also keep track
of the modications in the explicit denitions of the constraints (6.3) which might in turn modify
the active set.
function (6.1) with a method of low complexity then appears much more dicult, despite the
fact that the objective is strictly convex.
Let us recall the small example of Chapter 3 dedicated to illustrate the non-convex nature of
our constraints. Consider the following graph, composed of 3 vertices and 3 arcs (m = 3), shown
in Figure 6.1.
t
o t
a1
,
,
,@ a2
,
@
@
@
a3
t d
A particular instance of the satisability problem, the 3-SAT problem, is one of the best
known NP-complete problem. We follow [50] for its brief description. Let X be a set of Boolean
variables fx1; x2; : : :; xlg. A truth assignment for X is a function t : X ! ftrue; falseg. Let x
be a variable in X , then we say that x is realized under t if t(x) = true. The variable :x will be
realized under t if and only if t(x) = false. We say that x and :x are literals dened upon the
variable x. A clause is a set of literals over X , such as fx1 ; x2; :x3g, representing the disjunction
of those literals and is satised by a truth assignment if and only if at least one of its member is
realized under that assignment. A set C of clauses over X is satisable if there exists some truth
assignment for X that simultaneoulsy satises all the clauses in C . The 3-SAT problem consists in
answering the question: is there a truth assignment for C , when the clauses in C contain exactly 3
literals over X ? Cook proved that this problem is NP-complete [24]. We can show that another
problem is NP-hard by showing that 3-SAT can be polynomially transformed to it.
Let ISP denote the decision problem: given an inverse shortest path problem and a bound k,
does there exist a solution with objective value at most k? We show that ISP is NP-complete,
which implies that the inverse shortest path problem is NP-hard.
Theorem 6.1 ISP is NP-complete.
t
1,,
t
tt tt
u1i -1
@
R
@
u2i
1
si
,
, @
@ di
@
R
@ ,
,
1@@ -
,
, 1
li1 1 li2
on whether the shortest path from si to di follows the upper path (via vertices u1i , u2i ) or lower
implicit shortest path constraints 100
one (via vertices li1, li2) of its associated graph. Imposing that
si = di,1 (for i = 2; : : :; l), (6:6)
we obtain a \chain-like" resulting graph representing our Boolean variables. A path from vertex
s1 to vertex dl in this graph is therefore equivalent to a truth assignment of all Boolean variables.
We assign an initial cost of 1 to each of the six arcs of the \Boolean graph" xi .
We now describe a representation of our p clauses. A clause c of the 3-SAT problem is a
disjunction of type (xi _ xj _ :xk ), for instance. The clause c will be associated with the choice
among three possible paths going from a vertex named ac to a vertex named bc , where ac and bc
are dierent from the vertices of the Boolean graphs xi (i = 1; : : :; l). Each of the three paths is
formed by three consecutive oriented arcs. The rst arc originates at vertex ac and has a zero cost,
and the last one terminates at vertex bc and has a zero cost too. The middle arc is determined as
one of the arcs (li1; li2) or (u1i ; u2i ) depending on whether the considered variable xi in the clause is
negated or not. The subgraph associated with a clause c of the type (xi _ xi+1 _:xk ) is illustrated
in Figure 6.3.
s
ac
A
A
A
A
0
/ A
A 0
+
s s s s s s
0 AA
U
A
s s s s s s s ppp s s s s
A
- - -
A
, 1 @ BR , 1 @ R A,,
, @R
si
, B @,
B,
@ d
i+1 sk ,
A @ dk
@
R ,@ R , @
RA ,
@ - ,B @ - , @A -1 ,
B
BBN
B
0 B 0
B
s
B
0
B
B
B
bc
Our representation of the variables xi and the clauses cj generates an weighted oriented graph,
which we call G . The original cost of any path between ac and bc (c = 1; : : :; p) is 1, and the cost
of the shortest path from s1 to dl is 3l. The 3-SAT problem then is equivalent to the question:
is there a choice of nonnegative arc weights in G such that the cost of the shortest path between
implicit shortest path constraints 101
each pair of nodes (ac ; bc) is zero as well as that of the shortest path from s1 to dl , and such that
the `2 distance of these weights to the original weights is at most 3l?
The equality constraints on the shortest paths in this formulation may be replaced by upper
bound constraints provided that we require the arcs weights (w) to be as close as possible to the
original weights w . The resulting problem is therefore
w kw , w k
min 2 (6:7)
subject to
w0 (6:8)
cost(ac ; bc) 0 c = 1; : : :; p (6:9)
cost(s1 ; dl) 0; (6:10)
where cost(n1; n2) is the cost of a shortest path from n1 to n2 in G .
We recognize, in the formulation (6.7){(6.10), an instance of our inverse shortest path problem
with upper bound constraints on the cost of shortest paths. We thus have found a transformation
from the 3-SAT problem to ISP.
Finally, it is easy to see that this transformation is polynomial, since the instance of ISP
we constructed has 6l + 6p arcs and 5l + 1 + 2p nodes. This complete our proof that ISP is
NP-complete. 2
6.4.3 Reoptimization
Once F (w; i1; : : :; inI ) has been determined at the feasible point w, it is possible to solve the
associated convex quadratic program P (w; i1; : : :; inI ). This process is called \reoptimization".
Because we assume that reoptimization will always take place at a point w which is the solution
of another subproblem P (w0 ; i01; : : :; i0nI ), it is not dicult to see that the new subproblem diers
from the old one in two dierent ways.
1. Some constraints of P (w0; i01; : : :; i0nI ) are now obsolete because the associated path, although
shortest for w0 , is no longer shortest for w. These constraints must be replaced by constraints
whose explicit description corresponds to paths that are shortest for w.
2. Although p1q (w0; i01; : : :; i0nI ) can still be shortest for w, another shortest path between oq and
dq may be chosen to dene the new subproblem. The constraint whose explicit description
corresponds to p1q (w; i01; : : :; i0nI ) must then be replaced by another constraint with explicit
description corresponding to p1q (w; i1; : : :; inI ).
As a consequence, some linear inequalities of the form (6.4) are dropped from the subproblem
and some new ones are added.
Adding new linear inequalities can be handled computationally by using the Goldfarb-Idnani
dual quadratic programming method, as is already the case in Chapter 4. Removing linear
inequalities can be handled much in the same way by computing the Goldfarb-Idnani step that
would add them and then taking the opposite. These calculations are straightforward applications
of the method presented in Chapter 4; they are detailed and illustrated in Section 6.5.
t
ttt
g
@
, I
@ c
b ,
ttt
I d ,
6
@ 6
@,
6
f i
t t
@
I
,
@,
a@
, I
e , @ h
Figure 6.4: A small example showing path combinations
Let us assume that [w]i = 10 for i = 1; : : :; 11, and consider the problem of minimizing (6.1)
with m = 11 subject to 12 constraints of type (6.3), dened by
8
>
> o1 = a; d1 = b; u1 = 10;
>
> o2 = a; d2 = c; u2 = 10;
>
>
> o3 = d; d3 = b; u3 = 5;
>
> o4 = d; d4 = c; u4 = 5;
>
>
>
> o5 = e; d5 = f; u5 = 10;
< o6 = f; d6 = g; u6 = 10;
> (6:16)
> o7 = h; d7 = i; u7 = 10;
>
> o8 = i; d8 = g; u8 = 10;
>
>
>
> o9 = e; d9 = a; u9 = 5;
>
> o10 = h; d10 = a; u10 = 5;
>
> o = b; d11 = g; u11 = 5;
>
: o1112 = c; d12 = g; u12 = 5:
We directly see that, at any solution, all arcs but ad will have a weight equal to 5, since nq = 1 for
all q 6= 1; 2. Suppose now that, for these latter constraints, the shortest paths have been ordered
as in (6.16) and have been considered by the algorithm in that order. As a consequence, solving
implicit shortest path constraints 105
the problem in the feasible region F (w; 1; : : :; 1) will give the solution [w^ ]i = 5 (i = 1; : : :; 11),
since the shortest path from a to b and that from a to c both use vertex d. The objective function
value at w^ is 137.5.
Note now that, at w^, the shortest paths are not unique between the o-d pairs (a; b) and
(a; c), since the paths a , f , b and a , i , c are also shortest. Furthermore, this set of weights
can be improved by considering P (w; ^ 2; 2; : : :; 1), whose solution has every arc weight equal to
5 except that of arc ad, which is equal to 10 and where the objective function has the value
125. Moreover, examining every possible shortest path separately would not allow any progress
because successively solving P (w;^ 2; 1; : : :; 1) and P (w;
^ 1; 2; : : :; 1) still gives the same solution w^ .
It is therefore crucial to consider every combination of shortest paths that are not unique at
a potential solution.
Proof. The number of paths between two vertices is nite, since the number of arcs m is
nite. As a consequence, the number of dierent convex polygons F (wi; Ci) computed at Steps 1,
and nci calculated in Step 3 are also nite. The algorithm consists of a sequence of convex inverse
shortest path problems diering by the actual shortest paths used in the explicit description
of the constraints. Furthermore, each of these subproblems is considered at most once and is
solvable in a nite number of operations. The complete algorithm therefore also terminates in a
nite number of steps. 2
Let us consider the point w^ obtained at termination of the algorithm. We now show that w^
is a local minimum of our problem (6.1){(6.3) and analyze the neighbourhood V (w^ ) around w^ in
which every other feasible point has a higher objective function value. In other words, we show
that w^ is locally \stable" as a local minimum in a neighbourhood of V (w^ ) of w^ in which all the
explicit shortest paths dening the constraints (6.3) remain unchanged when they are unique.
The solution's \stability" therefore depends on \how far" the second shortest paths are from w^.
Considering the q -th shortest path constraint, we denote the cost of the \optimal" shortest
path from oq to dq by Pq1 , that is
X
Pq1 def
= [w^ ]a: (6:17)
a2p1q (w^)
We already mentioned that p1q (w^ ) may be not unique, although Pq1 is. We then dene a second
shortest path from oq to dq as a path whose cost is closest but strictly larger than that of p1q (w^),
i.e. Pq1 . The rst such second shortest path (in our predened path order) is denoted, if it exists,
by p2q (w^ ) and its cost by Pq2 . If p2q (w^) does not exist, then we set Pq2 = 1 by convention. With
these additional notations, we are now in position to state the next property of our algorithm.
implicit shortest path constraints 106
Theorem 6.3 The point w^ computed by Algorithm 6.1 is a local optimum of P (w), the original
problem. Moreover, f (w) f (w^ ) for every w in
V (w^) def
= w2F j kw , w^k1 < min
q [Pq , Pq ]
2 1 ; (6:18)
where k k1 is the usual `1 -norm.
Proof. Let us consider the conditions under which p1q (w^) may vary around w^, and dene
a stability neighbourhood Vq (w^ ) associated with each shortest path constraint. Four cases need
to be examined.
1. Pq2 = 1 and nq = 1.
In this situation, the path from oq to dq is unique and p1q (w) is obviously constant for all
w 2 Rm. We then dene Vq (w^) = Rm \ F = F .
2. Pq2 = 1 and nq > 1.
There are now more than one path from oq to dq , but they all have the same cost Pq1 .
In this case, an innitesimal change in the cost w^ may cause the feasible polygon dened
at w^ to change. However, since w^ is a point produced by our algorithm, choosing any of
the nq , 1 other possible polygons does not produce an objective function decrease. This
indicates that f (w^) may not be improved upon in the neighbourhood Vq (w^ ) = F .
3. Pq2 6= 1 and nq = 1.
In this situation, the explicit description shortest path p1q (w^ ) will not change until its
cost reaches that of the second shortest path. More precisely, p1q (w) is constant in the
neighbourhood
Vq (w^) = fw 2 F : kw , w^k1 < Pq2 , Pq1g: (6:19)
4. Pq2 6= 1 and nq > 1.
This is a combination of the two previous cases. As above, f (w) cannot be improved upon
in the neighbourhood Vq (w^ ) = fw 2 F : kw , w^ k1 < Pq2 , Pq1 g.
Moreover, the algorithm's mechanism implies that we cannot nd a point better than w^ by
considering all combinations of constraint denitions as examined above for a single constraint.
As a consequence, w^ will be a \stable" solution in the neighbourhood
\
nI
V (w^) = Vq(w^); = w 2 F : kw , w^k1 < min
q q[ P 2 , P 1] :
q (6:20)
q=1
2
We now examine the case where the original ISP problem also features lower bounds on the
costs of the shortest paths between given origins and destinations, that is constraints of the type
X
0 lq [w]a; (6:21)
a2p1q (w)
implicit shortest path constraints 107
for q = 1; : : :; nI (where lq can be chosen as zero) and lq uq . These constraints are much
easier to handle, because the inequality (6.21) must be satised for every possible path from oq
to dq . Of course, the number of these linear constraints is typically very high, but the situation is
entirely similar to that handled in Chapter 4 and 5. As a consequence, the technique developed
in these last chapters is directly applicable to each convex subproblem arising in the course of
the solution of problem (6.1){(6.3), (6.21).
6.5.1 Notations
Let us consider the problem P (w; i1; : : :; inI ). If we \freeze" the arc weight values w, we may
rewrite the i-th constraint of type (6.3) as
Ei(w) def
= nTi w , bi 0 (i = 1; : : :; nI ); (6:22)
where ni 2 Rm and bi = ,ui for i = 1; : : :; nI . The vector ni represents the normal to the i-th
constraint. The matrix of the normal vector of the constraints in the active set indexed by A
will be denoted by N . A, will denote a subset of A containing one fewer element than A, and
N , will represent the matrix of normals corresponding to A, . The normal nr will designate the
column deleted from N to give N , . The index set A, then designates A n frg.
Since the Hessian G of the objective function (6.1) equals the identity, the Moore-Penrose
generalized inverse of N in the space of variables under the transformation y = G w simply is
1
2
N def
= (N T N ),1 N T ; (6:23)
and
H def
= (I , NN ); (6:24)
the orthogonal projection on the null-space of N , is then the inverse reduced Hessian of the
quadratic objective function in the subspace of weights satisfying the active constraints. Denoting
the gradient of f , by g (w) = w , w, we designate the Lagrange multipliers at the point w by
u(w). Let us dene P (A) as the problem of minimizing (6.1) subject to the subset of constraints
(6.22) indexed by A and considered as equalities. As proved in Chapter 3, at the optimal solution
w^ of problem P (A), we can write (3.33) and (3.34) as
u(w^) N g (w^ ) 0; (6:25)
and
Hg (w^ ) = 0; (6:26)
implicit shortest path constraints 108
respectively. This formulation comes from the fact that g (w^) is a linear combination of the
columns of N , g (w^) = Nu(w^), as is implied by the rst order condition (6.25). Remember that
conditions (6.25) and (6.26) are also sucient to characterize w^ .
Finally, H , will denote the operator (4.9) with N replaced by N , and we will use similar
notations for u.
nr ni ni
ŵ ŵ s
w*
-nr
w̄ w̄
and
t = ur (w^) = , (Esr0 )(Twn) : (6:33)
r
!
Since s = ,s0 and d =
d, , we have proved the following theorem.
,1
Theorem 6.4 If w^ is solution of P (A), and if s = ,H , nr , then, the weight vector w = w^ + ts,
such that t = ur (w^ ), veries the optimality conditions of P (A, ), that is, the primal optimality of
w
H ,g (w) = 0; (6:34)
the primal feasibility of w
Ei(w) nTi w , bi 0 (i 2 A, ); (6:35)
and the dual feasibility of w
u, (w) (N , )g (w) 0: (6:36)
As a consequence, note that
f (w ) , f (w^ ) = 12 [ur (w^ )]2nTr s 0: (6:37)
because nTr s 0, since H , is positive semi-denite.
6.6.2 Tests
We have selected a few problems whose graphs, shortest path constraints and a priori weights
have been generated in dierent ways. The problems and their characteristics are summarized
in Table 6.1. In this table, the heading \vertices" refers to the number of vertices in the graph,
\graph type" indicates how the network is generated, \weights" indicates how the a priori weights
are chosen (a layered choice means that subset of arcs were chosen with constant costs, corre-
sponding to grid levels in the case of grid-like graphs), \constraints" indicates how the shortest
path constraints are chosen: either by choosing origins and destinations at random or by choosing
them along the faces of the grids, when applicable.
All these problems but P1 were solved in that a local minimum was found for all of them.
implicit shortest path constraints 111
The results of applying our pilot code to these problems is reported in Table 6.2. In this
P
table, i and nc = ij =1 ncj refer to the number of iterations of the algorithm and the total number
of possible active shortest paths combinations respectively, as described in Section 6.4.4. The
column \comb" indicates how many of the nc paths combinations were eectively examined by
the algorithm before termination. The symbol , means that it has not been possible to solve
problem P1 in less than a week on our workstation.
Problem i nc comb
Example 2 4 4
P1 - 109 -
P2 2 48 1
P3 1 0 0
P4 1 0 0
P5 1 0 0
P6 2 0 0
P7 1 0 0
P8 1 0 0
Table 6.2: Results for the test problems
should share many arcs and be usually \close" to each other, so that modifying arc weights on
one of them may in turn modify the explicit denition of proximate shortest paths very likely.
Table 6.2 also indicate that the combinatorial nature of the problem showed up again only
with grid graphs. This is due to the fact that the nonuniqueness of shortest paths relies on the
density of the graph m=n: an increase in that value tends to reduce the nonuniqueness of the
shortest paths. This trend in 2D grids has been analysed by Moser in his Ph. D. Thesis [87]. We
report here one interesting result he mentioned in his report.
Let us consider a 2D grid and denote its nodes by a double index (i; j ) specifying their
Cartesian position in the grid. The number of shortest paths from node (0; 0) to node (i; j ) is
designated by s(i; j ). We choose the simplest 2D grid which is squared or cross-ruled like the
streets of New York, where each link or arc have the same weight. For convenience, we do not
mention the grid size and will refer to the entire grid by specifying i; j 0. In such grids, s(i; j )
can be recursively computed by the following relation:
s(i + 1; j + 1) = s(i + 1; j ) + s(i; j + 1); for i; j 0 (6:38)
with the initial condition that
s(i; 0) = s(0; j ) = 1; for i; j 0: (6:39)
Indeed, there are two shortest paths from (i; j ) to (i + 1; j + 1), one going through node (i + 1; j )
and the other one through node (i; j + 1). We can make use of an auxiliary function f (x; y ) in
order to solve (6.38) knowing (6.39):
X
f (x; y ) def
= s(i; j )xiy j (6:40)
i;j 0
where
0 x; y < 1: (6:41)
The function f (x; y ) can be developed as follows.
f (x; y ) = 1 + Pi1 s(i; 0)xi + Pj 1 s(0; j )y j + Pi;j 1 s(i; j )xiy j by (6:39)
P P
= 1 + 1,x x + 1,y y + i;j 1 s(i; j , 1)xiy j + i;j 1 s(i , 1; j )xiy j by (6:38)
P P
= 1 + 1,x x + 1,y y + y i1;j 0 s(i; j )xiy j + x i0;j 1 s(i; j )xiy j
= 1 h+ 1,x x + 1,y y + i h i
y f (x; y) , Pj0 s(0; j )y j + x f (x; y ) , Pi0 s(i; 0)xi
= 1 + (x + y )f (x; y )
(6:42)
We then have that
f (x; y ) = 1,x1,y
P
= k0 (x + y )k
P P
= k0 ki=0 ( ki )xiy k,i
(6:43)
P
= i;j 0 ( i+ji )xi y j ;
implicit shortest path constraints 113
where ( ab ) = [a!(bb,! a)!] are the binomial coecients, which represent the number of possible choices
of choosing a items amongst a set of b items without distinguishing their ordre and without taking
the same item twice.
As a consequence (6.40) and (6.43) give
s(i; j ) = ( i+ji ) = (i i+!jj! )! : (6:44)
It is easy to check that s(i; j ) will grow very rapidly even when (i; j ) lies not far from (0; 0) (for
instance, s(10; 10) = 184756).
Moser mentioned similar results for 2D grids with higher densities. He observed that s(i; j )
decreases very rapidly when each node originates at an increasing number arcs. However, this
does not mean that shortest path uniqueness is attained with high densities. Moser extrapolated
his observations and conjectured that shortest paths are uniquely determined only for nodes on
a few straight2 lines through (0; 0).
This explains the potential nonuniqueness that is present in our grids, especially those built
with constant arc weights. The resolution of problem P1 suered from this potential nonunique-
ness. Let us nally mention that amongst all generated problems (not only those presented in this
section) problem P1 is the only one which presented a so strong combinatorial aspect (nc 109 ).
2
Grids generated by Moser include namely square diagonals which allows to reach several nodes by straight
lines from (0; 0).
7
This thesis deals with instances of the inverse shortest path problem. Signicant questions
arising in applied mathematics can be formulated as instances of this inverse problem. Chapter 1
showed the pertinence of such a formulation in problems arising in trac modelling and seismic
tomography. In a typical inverse shortest path problem we want to recover some attributes of a
network, while knowing informations about the shortest paths between certain origin-destination
pairs from observing actual
ow on the network. To make the solution unique, we assume that
a set of attributes is a priori known, and we would like the solution to be as close as possible
to these a priori known values. Thus, the objective function is to minimize the norm of the
dierence between the solution and the known vector of attributes. The constraints assure that
the shortest paths are the same or verify the same properties under the solution, and may
represent other relationship between the paths or variables of the problem. Solving the problem
require an algorithm solving the direct problem (the shortest path problem) and an algorithmic
framework that depends on the choice of the norm for evaluating the proximity of the solution
to the a priori values. Chapter 2 discussed the choice of a shortest algorithm with respect to the
properties of graphs representing the networks under study. Johnson's algorithm was selected
to solve the direct problem in our context. This method could eventually be combined with
updating techniques for eciency. Chapter 3 examined the quadratic programming context and
nally preferred the Goldfarb and Idnani approach to solve our problem. The need to handle an
exponential number of linear constraints involving much redundancy has guided this choice.
In the uncorrelated inverse shortest path problem, the variables are the weights on the arcs
and there is no correlation between them. In the correlated problem, however, the arcs are
divided into classes, and the weights of the arcs in the same class are derived from the same value
which is referred to as the class' density. Thus, the weights within each group are correlated, and
the variables are actually the densities. Correlation of course implies more restriction and hence
more constraints.
In Chapter 4, the uncorrelated inverse shortest paths problem has been posed and a compu-
tational algorithm has been proposed for one of the many problem specications: the constraints
are given as a set of shortest paths and nonnegativity constraints on the weights. The proposed
algorithm has been programmed and run on a few examples, in order to prove the feasibility of
114
conclusion and perspectives 115
the approach.
Chapter 5 provides a modied method for solving the inverse shortest path problem with
correlated arc weights. To achieve this goal, we generalized the inverse shortest path method
of Chapter 4 to take the desired correlation into account. We derived new expressions for the
primal step and other quantities in the algorithm. We tested our new algorithm on a wide class of
correlated problems and compared it with the original uncorrelated method. Finally, two possible
strategies for handling constraints were considered and compared in this context.
Finally, in Chapter 6, we have presented and motivated the inverse shortest path problem
with upper bounds on shortest paths costs. These constraints may no longer be expressed as
sets of linear constraints. The resulting feasible region may therefore be non-convex. The NP-
completeness of nding a global solution of this problem has then been shown. An algorithm for
local minimization has been presented, analyzed and tested on a few examples.
In this thesis, we have supplied algorihms to solve the main common instances of the inverse
shortest path problem. The possible extensions are many. New perspectives should concern
the use of other norms in the objective function. The resulting (stronger) non-linearity then
would call for dierent approaches. Other types of constraint specications are also of obvious
interest. The problem of recovering attributes of the second (third, : : : ) shortest paths is also
a challenging area (for instance, this could help in retrieving the successive shortest time waves
in seismic tomography). Further research could cover heuristics for active path selection and
stability analysis. We are also interested in applying the algorithms discussed in this thesis to
practical cases in trac engineering and computerized tomography.
Extensions or applications of the inverse shortest path problem have been informally proposed
after several conferences given on the subject. A. Lucena (London 1991) perceived an application
of our method to the update of Lagrange multipliers in an algorithm solving the time-dependent
travelling salesman problem [80]. Another application suggested by S. Boyd (August 1992) resides
in nding or estimating transition probabilities in Markov chains given the maximum likehood
path. The formulation of this last problem includes an objective function that does not depend
on a usual norm, but involves a \max" function. The objective function has the property of
remaining convex in this particular denition. The connection between this problem and ours
can be viewed when identifying the vertices with states or events of a Markov chain, arcs with
possible transitions, paths as sequences of states and arc weights with transition probabilities.
We hope that still numerous such extensions will show up in future research.
A
Symbol Index
This appendix is intended to help the reader in readily retrieving the meaning of a symbol used
in Chapters 4{6.
116
symbol index 117
[1] A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer
Algorithms, Addison-Wesley, Reading, MA, 1974.
[2] S.S. Anderson, Graph theory and nite combinatorics, Markham Publishing Company,
Chicago, 180 pp., 1970.
[3] M. Avriel, Nonlinear Programming: Analysis and Methods, Prentice-Hall, Inc., Engle-
wood Clis, NJ, 512 pp., 1976.
[4] E.M.L. Beale, \On Quadratic Programming", Naval Research Logistics Quarterly, vol. 6,
pp. 227{244, 1959.
[5] R. Bellman, \On a routing problem", Quart. Appl. Math., vol. 16, pp. 87{90, 1958.
[6] D.P. Bertsekas, \A new algorithm for the assignment problem", Mathematical Program-
ming, vol. 21, pp. 152{171, 1981.
[7] D.P. Bertsekas, \The Auction Algorithm for Assignment and Other Network Flows
Problems: A Tutorial", INTERFACES, vol. 20:4, pp. 133{149, July{August 1990.
[8] D.P. Bertsekas, \An auction algorithm for shortest paths", Lab. for Information and
Decision Systems Report P-2000, MIT, Cambridge, MA, 1990, revised February 1991.
SIAM Journal on Optimization, to appear.
[9] D.P. Bertsekas, Linear Network Optimization: Algorithms and Codes, The MIT Press,
Cambridge, Massachusetts, 359 pp., 1991.
[10] J.C.G. Boot, \Notes on quadratic programming: the Kuhn-Tucker and Theil-van de
Panne conditions, degeneracy, and equality constraints", Management Science, vol. 8, No.
1, pp. 85{98, October 1961.
[11] J.C.G. Boot, \On trivial and binding constraints in programming problems", Manage-
ment Science, vol. 8, pp. 419{441, 1962.
[12] J.C.G. Boot, Quadratic Programming, North-Holland Publishing Co., Amsterdam, 213
pp.,1964.
119
bibliography 120
[13] P.H.L. Bovy and E. Stern, Route Choice: Waynding in Transport Networks, Kluwer
Academic Publishers, Dordrecht, 1990.
[14] D. Burton, Analyse et implementation de methodes des plus courts chemins dans un
reseau urbain, Master's Thesis, Facultes Universitaires Notre-Dame de la Paix, Namur,
1986.
[15] D. Burton and Ph.L. Toint, \On an instance of the inverse shortest paths problem",
Mathematical Programming, vol. 53, pp. 45{61, 1992.
[16] D. Burton and Ph.L. Toint, \On the use of an inverse shortest paths algorithm for
recovering linearly correlated costs", Mathematical Programming (to appear), 1993.
[17] D. Burton, B. Pulleyblank and Ph.L. Toint, \The inverse shortest path problem
with upper bounds on shortest path costs", Internal Report, FUNDP, (submitted to ORSA
Journal on Computing), 1993.
[18] P.H. Calamai and A.R. Conn, \A stable algorithm for solving the multifacility loca-
tion problem involving Euclidian distances", SIAM Journal on Scientic and Statistical
Computing, vol. 4, pp. 512{525, 1980.
[19] P.H. Calamai and A.R. Conn, \A second-order method for solving the continuous
multifacility location problem", in: G.A. Watson, ed., Numerical Analysis: Proceedings of
the Ninth Biennial Conference, Dundee, Scotland, Lectures in notes in Mathematics 912,
Springer-Verlag (Berlin, Heidelberg and New York), pp. 1{25, 1982.
[20] P.H. Calamai and A.R. Conn, \A projected Newton method for lp norm location prob-
lem", Mathematical Programming, vol. 38, pp. 75{109, 1987.
[21] A. Cayley, \On the theory of the analytical forms call trees", Philos. Mag., vol. 13, pp.
172{176, 1857. Mathematical Papers, Cambridge, vol. 3, pp. 242{246, 1891.
[22] A.R. Conn, \Constrained optimization using non-dierentiable penalty function", SIAM
Journal of the Numerical Analysis, vol. 10, pp. 760{784, 1973.
[23] A.R. Conn and J.W. Sinclair, \Quadratic programming via a non-dierentiable penalty
function", Department of Combinations and Optimization Research, University of Water-
loo, Rep. CORR 75-15, 1975.
[24] S. Cook, \The complexity of Theorem Proving Procedures", Proc. 3rd Ann. ACM Symp.
on Theory of Computing, Association for Computing Machinery, New York, 151{158, 1971.
[25] W.-K. Chen, Applied graph theory, North-Holland Publishing Company, 484 pp., 1971.
[26] N. Christofides, Graph Theory. An algorithmic approach, Academic Press, London, 400
pp., 1975.
bibliography 121
[42] M. Florian, S. Nguyen and S. Pallottino, \A Dual Simplex Algorithm for Finding
all Shortest Paths", Networks, vol. 11, pp. 367{378, 1981.
[43] R.W. Floyd, \Algorithm 97: shortest path", Comm. ACM, vol. 5, p. 345, 1962.
[44] L.R. Ford, \Network
ow theory", Report P-923, The Rand Corporation, Santa Monica,
CA, 1956.
[45] L.R. Ford and D.R. Fulkerson, Flows in networks, Princeton University, Princeton,
NJ, 1962.
[46] M. Frank and P. Wolfe, \An algorithm for quadratic programming", Naval Research
Logistics Quarterly, vol. 3, pp. 95{110, 1956.
[47] S. Fujishige, \A note on the problem of updating shortest paths", Networks, vol. 11, pp.
317{319, 1981.
[48] G. Gallo and S. Pallottino, \Shortest path methods: A unifying approach", Mathe-
matical Programming Study, vol. 26, pp. 38{64, 1986.
[49] G. Gallo and S. Pallottino, \Shortest path algorithms", Annals of Operations Re-
search, vol. 13, pp. 3{79, 1988.
[50] M.R. Garey and D.S. Johnson, Computers and intractability. A guide to the theory of
NP-Completeness, W.H. Freeman and Company, San Fransisco, 1979.
[51] J.A. George and J.W. Liu, Computer solution of large positive denite systems,
Prentice-Hall, Englewood Clis, 1981.
[52] P.E. Gill and W. Murray, \Numerically stable methods for quadratic programming",
Mathematical Programming, vol. 14, pp. 349{372, 1978.
[53] D. Goldfarb, \Extension of Newton's method and simplex methods for solving quadratic
programs", in: F.A. Lootsma, ed., Numerical methods for non-linear optimization (Aca-
demic Press, London, 1972), pp. 239{254, 1972.
[54] D. Goldfarb, J. Hao and S.-R. Kai, \Shortest path algorithms using dynamic breadth-
rst search", Networks, vol. 21, pp. 29{50, 1991.
[55] D. Goldfarb and A. Idnani, \A Numerically Stable Dual Method for Solving Strictly
Convex Quadratic Programs", Mathematical Programming, vol. 27, pp. 1{33, 1983.
[56] G.H. Golub and C.F. van Loan, Matrix computations, North Oxford Academic, Oxford,
476 pp., 1983.
[57] M. Gondran and M. Minoux, Graphes et algorithmes, 2nd edition, Editions Eyrolles,
Paris, 546 pp., 1990.
English translation by S. Vajda, Graphs and Algorithms, Wiley-Interscience, NY, 1984.
bibliography 123
[58] A.S. Goncalves, \A primal-dual method for quadratic programming with bounded vari-
ables", in: F.A. Lootsma, ed., Numerical methods for non-linear optimization (Academic
Press, London, 1972), pp. 255{263, 1972.
[59] S. Goto, T. Ohtsuki and T. Yoshimura, \Sparse matrix techniques for the shortest
path problem", IEEE Trans. Circuits and Systems, CAS-23, pp. 752{758, 1976.
[60] F. Glover, R. Glover and D. Klingman, \Computational study of an improved short-
est path algorithm", Networks, vol. 14, pp. 25{, 1984.
[61] W.R. Hamilton, \Account of the icosian calculus", Proc. Roy. Irish. Acad., vol. 6, pp.
415{416, 1853{7.
[62] D.Y. Handler and P.B. Mirchandani, Location on Networks: Theory and Algorithms,
The MIT Press, Cambridge, Massachusetts, 233 pp., 1979.
[63] F. Harary, Graph theory, Addison-Wesley Publishing Company, 274 pp., 1972.
[64] F. Harary, R.Z. Norman and D. Cartwright, Structural Models: An Introduction to the
Theory of Directed Graphs, John Wiley & Sons, Inc., New York, 1965.
[65] G.T. Herman, Image reconstruction from projections: the fundamentals of computerized
tomography, Academic Press, New York, 1980.
[66] D.B. Johnson, \A note on Dijkstra's shortest path algorithm", J. Assoc. Comput. Mach.,
vol. 20, pp. 385{388, 1973.
[67] D.B. Johnson, \Ecient algorithms for shortest paths in sparse networks", J. Assoc.
Comput. Mach., vol. 24, pp. 1{13, 1977.
[68] E.L. Johnson, \On shortest paths and sorting", Proceedings of the 25th ACM Annual
Conference, pp. 510{517, 1972.
[69] R.M. Karp, \On the computational complexity of combinatorial problems", Networks,
vol. 5, pp. 45{68, 1975.
[70] A. Kershenbaum, \A note on nding shortest path trees", Networks, vol. 11, pp. 399-400,
1981.
[71] G. Kirchhoff, \U ber die Au
osung der Gleichungen, auf welche man bei der Unter-
suchung der linearen Verteilung galvanischer Strome gefuhrt wird", Ann. Phys. Chem., vol.
72, pp. 497{508, 1847.
[72] D.E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-
Wesley, Reading, MA, 1973.
[73] H.W. Kuhn and A.W. Tucker (Eds.), Linear inequalities and related systems, Princeton
University Press, Princeton, NJ, 1956.
bibliography 124
[74] C.E. Lemke, \The dual method for solving the linear programming problem", Naval
Reasearch Logistics Quarterly, vol. 1, No. 1, 1954.
[75] C.E. Lemke, \A method of solution for quadratic programs", Management Science, vol.
8, pp. 442{453, 1962.
[76] N.P. Loomba and E. Turban, Applied programming for management, Holt, Rinehart &
Winston, Inc., 475 pp., 1974.
[77] F.A. Lootsma, ed., Numerical methods for non-linear optimization, Academic Press, Lon-
don, 440 pp., 1972.
[78] A.K. Louis, \Computerized tomography. I: Physical background and mathematical mod-
elling", Extended version of a conference, given in February 1984 at the Facultes Universi-
taires Notre-Dame de la Paix, Namur (Belgium), 1984.
[79] A.K. Louis and F. Natterer, \Mathematical problems of computerized tomography",
Proc. IEEE, vol. 71, no 3, pp. 379{389, 1983.
[80] A. Lucena, \Time-dependent traveling salesman problem | the deliveryman case", Net-
works, vol. 20, pp. 753{763, 1990.
[81] D.G. Luenberger, Linear and nonlinear programming, 2nd. edition, Addison-Wesley,
Reading, MA, 491 pp., 1984.
[82] M. Minoux, Programmation mathematique : theorie et algorithmes- tome 1, Dunod, Paris,
294 pp., 1983.
English translation by S. Vajda, Mathematical Programming: Theory and Algorithms,
Wiley-Interscience, NY.
[83] M. Minoux and G. Bartnik, graphes, algorithmes, logiciels, Dunod, Paris, 428 pp., 1986.
[84] P. Mirchandani and H. Soroush, \Generalized Trac Equilibrium with Probabilistic
Travel Times and Perceptions", Transportation Science, vol. 21, no 3, pp. 133{152, 1987.
[85] E.F. Moore, \The shortest path through a maze", in Proceedings of the International
Symposium on the Theory of Switching, Part II, 1957, Harvard University, Cambridge,
MA, pp. 285{292, 1959.
[86] T.J. Moser, \Shortest path calculation of seismic rays", Geophysics, vol. 56, pp. 59{67,
1991.
[87] T.J. Moser, \The shortest path method for seismic ray tracing in complicated media",
Ph.D. Thesis, Rijksuniversiteit Utrecht, 1992.
[88] J.D. Murchland, \A xed matrix method for all shortest distances in a directed graph
and for inverse problems", Ph.D. Thesis, Karlsruhe University, 1970.
bibliography 125
[89] G.L. Nemhauser and L.A. Wolsey, Integer and Combinatorial Optimization, A Wiley-
Interscience Publication, John Wiley & Sons, 763 pp., 1988.
[90] G. Neumann-Denzau and J. Behrens, \Inversion of seismic data using tomographical
reconstruction techniques for investigations of laterally inhomogeneous media", Geophys.
J. R. astr. Soc., vol. 79, pp. 305{315, 1984.
[91] G. Nolet, ed., Seismic Tomography, D. Reidel Publishing Company, Dordrecht, 387 pp.,
1987.
[92] V.E. Outram and E. Thompson, \Driver's perceived cost in route choice", Proceedings
- PTRC Annual Meeting, London, pp. 226{257, 1978.
[93] S. Pallottino, \Shortest path methods: complexity, interrelations and new propositions",
Networks, vol. 14, pp. 257{267, 1984.
[94] U. Pape, \Implementation and eciency of Moore algorithms for the shortest route prob-
lem", Mathematical Programming, vol. 7, pp. 212{222, 1974.
[95] A.R. Pierce, \Bibliography on algorithms for shortest path, shortest spanning tree and
related circuit routing problems", Networks, vol. 5, pp. 129{149, 1975.
[96] M.J.D. Powell, \On the quadratic programming algorithm of Goldfarb and Idnani",
Mathematical Programming Study, vol. 25, pp. 45{61, 1985.
[97] M.J.D. Powell, \ZQPCVX, A Fortran subroutine for convex quadratic programming",
Report DAMTP/NA17, Department of Applied Mathematics and Theoretical Physics, Uni-
versity of Cambridge, Cambridge, UK, 1983.
[98] F.S. Roberts, Graph Theory and Its Applications to Problems of Society, SIAM, CBMS-
NSF, Regional Conference Series in Applied Mathematics, Philadelphia, Pennsylvania, 122
pp., 1978.
[99] B. Roy, Algebre moderne et theorie des graphes, Tome II, Dunod, Paris, 1970.
[100] A. Sartenaer, \On the application of the auction algorithm of Bertsekas for the search
of shortest routes in a urban network", Technical Report 92/27, Facultes Universitaires
Notre-Dame de la Paix, Namur, Departement de Mathematique, 1991.
[101] Y. Sheffi, Urban Transportation Networks, Prentice-Hall, Englewood Clis, 1985.
[102] P.A. Steenbrink, Optimization of Transport Networks, Wiley, Bristol, 1974.
[103] J. Stoer, \On the numerical solution of constrained least-squares problems", SIAM Jour-
nal on Numerical Analysis, vol. 8, No. 2, pp. 382{411, 1971.
[104] A. Tarantola, Inverse problem theory. Methods for data tting and model parameter
estimation, Elsevier, 1987.
bibliography 126
[105] A. Tarantola and B.Valette, \Generalized nonlinear inverse problems solved using
the least square criterion", Reviews of Geophys. and Space Phys., vol. 20, pp. 219{232,
1982.
[106] R.E. Tarjan, \Complexity of combinatorial algorithms", SIAM Review, vol. 20, nr. 3, pp.
457{491, 1978.
[107] R.E. Tarjan, Data Structures and Network Algorithms, SIAM, CBMS-BSF, Regional
Conference Series in Applied Mathematics, Philadelphia, 131 pp., 1983.
[108] H. Theil and C. Van de Panne, \Quadratic programming as an extension of classical
quadratic maximization", Management Science, vol. 7, No. 1, pp. 1{20, October 1960.
[109] C. Van de Panne and A. Whinston, \The simplex and the dual method for quadratic
programming", Operations Research Quarterly, vol. 15, pp. 355{389, 1964.
[110] C. Van de Panne and A. Whinston, \A comparison of two methods for quadratic
programming", Operations Research, vol. 14, pp. 422{441, 1966.
[111] J. Van Leeuwen, Ed., Algorithms and Complexity, Volume A of Handbook of Theoretical
Computer Science, Elsevier, Asmterdam, and The MIT Press, Cambridge, Massachusetts,
996 pp., 1990.
[112] D. Van Vliet, \Improved shortest path algorithms for transport networks", Transporta-
tion Research, vol. 12, pp. 7{20, 1978.
[113] S.A. Vavasis, Nonlinear Optimization: Complexity Issues, Oxford University Press, Inc.,
NY, 165 pp., 1991.
[114] J.W.J. Williams, \Algorithm 232: Heapsort", Comm. ACM, vol. 7, pp. 347{348, 1964.
[115] P. Wolfe, \The Simplex Method for Quadratic Programming", Econometrica, vol. 27,
pp. 382{398, 1959.
[116] J.H. Woodhouse and A.M. Dziewonski, \Mapping the upper mantle: three-dimen-
sional modeling of Earth structure by inversion of seismic waveforms", Journal of Geophys-
ical Research, vol. 89 B7, pp. 5953{5986, 1984.
[117] J.Y. Yen, A shortest path algorithm, Ph.D. Thesis, University of California, Berkeley,
1970.