Professional Documents
Culture Documents
MODELS:
a critical perspective
EVALUATION AND
DECISION MODELS:
a critical perspective
Denis Bouyssou
ESSEC
Thierry Marchant
Ghent University
Marc Pirlot
SMRO, Faculte Polytechnique de Mons
Patrice Perny
LIP6, Universite Paris VI
Alexis Tsoukias
LAMSADE - CNRS, Universite Paris Dauphine
Philippe Vincke
SMG - ISRO, Universite Libre de Bruxelles
1 Introduction 1
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Who are the authors ? . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
v
vi
4 Constructing measures 53
4.1 The human development index . . . . . . . . . . . . . . . . . . . . 54
4.1.1 Scale Normalisation . . . . . . . . . . . . . . . . . . . . . . 56
4.1.2 Compensation . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.3 Dimension independence . . . . . . . . . . . . . . . . . . . . 58
4.1.4 Scale construction . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.5 Statistical aspects . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Air quality index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Non compensation . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.3 Meaningfulness . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 The decathlon score . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.1 Role of the decathlon score . . . . . . . . . . . . . . . . . . 65
4.4 Indicators and multiple criteria decision
support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10 Conclusion 237
10.1 Formal methods are all around us . . . . . . . . . . . . . . . . . . . 237
10.2 What have we learned? . . . . . . . . . . . . . . . . . . . . . . . . 239
10.3 What can be expected? . . . . . . . . . . . . . . . . . . . . . . . . 243
Bibliography 247
Index 262
1
INTRODUCTION
1.1 Motivations
Deciding is a very complex and difficult task. Some people even argue that our abil-
ity to make decisions in complex situations is the main feature that distinguishes
us from animals (it is also common to say that laughing is the main difference).
Nevertheless, when the task is too complex or the interests at stake are too impor-
tant, it quite often happens that we do not know or we are not sure what to decide
and, in many instances, we resort to a decision support technique: an informal
onewe toss a coin, we ask an oracle, we visit an astrologer, we consult an expert,
we thinkor a formal one. Although informal decision support techniques can be
of interest, in this book, we will focus on formal ones. Among the latter, we find
some well-known decision support techniques: cost-benefit analysis, multiple crite-
ria decision analysis, decision trees, . . . But there are many other ones, sometimes
not presented as decision support techniques, that help making decisions. Let us
cite but a few examples.
When the director of a school must decide whether a given student will pass
or fail, he usually asks each teacher to assess the merits of the student by
means of a grade. The director then sums the grades and compares the result
to a threshold.
When a bank must decide whether a given client will obtain a credit or not,
a technique, called credit scoring, is often used.
When the mayor of a city decides to temporarily forbid car traffic in a city
because of air pollution, he probably takes the value of some indicators, e.g.
the air quality index, into account.
Groups or committees must also make decisions. In order to do so, they
often use voting procedures.
All these formal techniques are what we call (formal) decision and evaluation
models, i.e. a set of explicit and well-defined rules to collect, assess and process
information in order to be able to make recommendations in decision and/or eval-
uation processes. They are so widespread that almost no one can pretend he is
1
2 CHAPTER 1. INTRODUCTION
Formal models require that the decision maker makes a substantial effort to
structure his perception or representation of the problem. This effort can
only be beneficial as it forces the decision maker to think harder and deeper
about his problem.
1.2 Audience
Most of us are confronted with formal evaluation and decision models. Very often,
we use them without even thinking about it. This book is intended for the aware
or enlightened practitioner, for anyone who uses decision or evaluation modelsfor
research or for applicationsand is willing to question his practice, to have a deeper
understanding of what he does. We have tried to keep mathematics and formalism
1.3. STRUCTURE 3
at a very low level so that, hopefully, most of the material will be accessible to the
not mathematically-inclined readers. A rich bibliography will allow the interested
reader to locate the more technical literature easily.
1.3 Structure
There are so many decision and evaluation models that it would be impossible to
deal with all of them within a single book. As will become apparent later, most of
them rely on similar kinds of principles. We decided to present seven examples of
such models. These examples, chosen in a wide variety of domains, will hopefully
allow the reader to grasp these principles. Each example is presented in a chapter
(Chapters 2 to 8), almost independent of the other chapters. Each of these seven
chapters ends with a conclusion, placing what has been discussed in a broader
context and indicating links with other chapters. Chapter 9 is somewhat different
from the seven previous ones: it does not focus on a decision model but presents a
real world application. The aim of this chapter is to emphasise the importance of
the decision aiding process (the context of the problem, the position of the actors
and their interactions, the role of the analyst, . . . ), to show that many difficulties
arise there as well and that a coherence between the decision aiding process and
the formal model is necessary.
Some examples have been chosen because they correspond to decision models
that everyone has experienced and can understand easily (student grades and
voting). We chose some models because they are not often perceived as decision
or evaluation models (student grades, indicators and rule based control). The other
examples (cost-benefit analysis, multiple criteria decision support and choice under
uncertainty) correspond to well identified and popular evaluation and decision
models.
1.4 Outline
Chapter 2 is devoted to the problem of voting. After showing the analogy between
voting and multiple criteria decision support, we present a sequence of twelve
short examples, each one illustrating a problem that arises with a particular voting
method. We begin with simple methods based on pairwise comparisons and we
end up with the Borda method. Although the goal of this book is not to overwhelm
the reader with theory, we informally present two theorems (Arrow and Gibbard-
Satterthwaite) that in one way or another explain why we encountered so many
difficulties in our twelve examples.
Then we turn to the way voters preferences are modelled. We present many
different models, each one trying to outdo the previous one but suffering its own
weaknesses. Finally, we explore some issues that are often neglected: who is going
to vote? Who are the candidates? These questions are difficult and we show that
they are important. The construction of the set of voters and the set of candidates,
as well as the choice of a voting method must be considered as part of the voting
process.
4 CHAPTER 1. INTRODUCTION
After examining voting, we turn in Chapter 3 to another very familiar topic for
the reader: students marks or grades. Marks are used for different purposes (e.g.
ranking the students, deciding whether a student is allowed to begin the next level
of study, deciding whether a student gets a degree, . . . ). Students are assessed in
a huge variety of ways in different countries and schools. This seems to indicate
that assessing students might not be trivial. We use this familiar topic to discuss
operations such as evaluating a performance and aggregating evaluations.
In Chapter 4, three particular indicators are considered: the Human Devel-
opment Index (used by the United Nations), the ATMO index (an air pollution
indicator used by the French government) and the decathlon score. We present
a few examples illustrating some problems occurring with indicators. We assert
that some difficulties are the consequences of the fact that the role of an indicator
is often manifold and not well defined. An indicator is a measure but, often, it is
also a tool for controlling or managing (in a broad sense).
Cost-benefit analysis (CBA) is a decision aiding method that is extremely
popular among economists. Following the CBA approach, a project should only
be undertaken when its benefits outweigh its costs. First we present the principles
of CBA and its theoretical foundations. Then, using an example in transportation
studies, we illustrate some difficulties encountered with CBA. Finally, we clarify
some of the hypotheses at the heart of CBA and criticise the relevance of these
hypotheses in some decision aiding processes.
In Chapter 6, using a well documented example, we present some difficulties
that arise when one wants to choose from or rank a set of alternatives considered
from different viewpoints. We examine several aggregation methods that lead to
a value function on the set of alternatives, namely the weighted sum, the sum of
utilities (direct and indirect assessment) and AHP (the Analytic Hierarchy Pro-
cess). Then we turn to the so called outranking methods. Some of these methods
can be used even when the data are not very rich or precise. The price we pay
for this is that results provided by these methods are not rich either, in the sense
that conclusions that can be drawn regarding a decision are not clear-cut.
Chapter 7 is dedicated to the study of automatic decision systems. These
systems concern the execution of repetitive decision tasks and the great majority
of them are based on more or less explicit decision rules aimed towards reflecting
the usual decision policy of humans. The goal of this section is to show the interest
of some formal tools (e.g. fuzzy sets) to model decision rules but also to clarify
some problems arising when simulating the rules. Three examples are presented:
the first one concerns the control of an automatic watering system while the others
are about the control of a food process. The first two examples describe decision
systems based on explicit decision rules; the third one addresses the case of implicit
decision rules.
The goal of Chapter 8 is to raise some questions about the modelling of un-
certainty. We present a real-life problem concerning the planning of electricity
production. This problem is characterised by many different uncertainties: for
example, the price of oil or the electricity demand in 20 years time. This prob-
lem is classically described by using a decision tree and solved with an expected
utility approach. After recalling some well known criticisms directed against this
1.5. WHO ARE THE AUTHORS ? 5
approach, we present the approach that has been used by the team that solved
this problem. Some of the drawbacks of this approach are discussed as well. The
relevance of probabilities is criticised and other modelling tools, such as belief
functions, fuzzy set theory and possibility theory, are briefly mentioned.
Convinced that there is more to decision aiding than just number crunching,
we devote the last chapter to the description of a real world decision aiding process
that took place in a large Italian company a few years ago. It concerns the eval-
uation of offers following a call for tenders for a GIS (Geographical Information
System) acquisition. Some important elements such as the participating actors,
the problem formulation, the construction of the criteria, etc. deserve greater con-
sideration. One should ideally never consider these elements separately from the
aggregation process because they can impact the whole decision process and even
the way the aggregation procedure behaves.
1.6 Conventions
To refer to a decision maker, a voter or an individual whose sex is not determined,
we decided not to use the politically correct he/she but just he in order to
make the text easy to read. The fact that all of the authors are male has nothing
to do with this choice. The same applies for his/her.
None of the authors is a native English speaker. Therefore, even if we did
our best to write in correct English, the reader should not be surprised to find
6 CHAPTER 1. INTRODUCTION
some mistakes or inelegant expressions. We beg the readers leniency for any
incorrectness that might remain.
The adopted spelling is the British and not the American one.
1.7 Acknowledgements
We are ggreatly indebted to our collEague
///////// friend Philippe Fortemps \cite{Fortemps99}
.
Without him and his knowledge of Late-
x, this book would look like this paragraph.%\newline
The authors also wish to thank J.-L. Ottinger, who contributed to Chapter
8, H. Melot, who laid out the complex diagrams of that chapter, and Stefano
Abruzzini, who gave us a number of references concerning indicators. Chapter 6
is based on a report by Sebastien Clement written to fulfil the requirements of a
course on multiple criteria decision support. Large part of chapter 9 uses material
already published in (Paschetta and Tsoukias 1999).
A special thank goes to Marjorie and Diane Gassner who had the patience to
read and correct our continental approximation of the English language and to
Francois Glineur who helped in solving a great number of latex problems.
We thank Gary Folven from Kluwer Academic Publisher for his constant sup-
port during the preparation of this manuscript.
2
CHOOSING ON THE BASIS OF
SEVERAL OPINIONS: THE
EXAMPLE OF VOTING
Frances president Each voter chooses one of the candidates. If one candidate
has been chosen by more than 50 % of the voters, he is elected. Otherwise
a second stage is organised. During the second stage, only two candidates
remain: those with the highest scores. Once again, each voter chooses one of
the candidates. The winner is the candidate that has been chosen by more
voters than the other one.
7
8 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS
Canadas members of parliament and prime minister Every five years, the
Canadian parliament is elected as follows. The territory is divided into about
270 constituencies called counties. In each county, each party can present
one candidate. Each voter chooses one candidate. The winner in a county is
the candidate that is chosen by more voters than any other one. He is thus
the countys representative in the parliament. The leader of the party that
has the most representatives becomes prime minister.
Those interested in voting methods and the way they are applied in various
countries will find valuable information in Farrell (1997) and Nurmi (1987). The
diversity of the methods applied in practice probably reflects some underlying
complexity and, in fact, if you take a closer look at voting, you will be amazed
by the incredible complexity of the subject. In spite of its apparent simplicity,
thousands of papers have been devoted to the problem of voting (Kelly 1991) and
our guess is that many more are to come.
Our aim in this chapter is, on the one hand, to show that many difficult and
interesting problems arise in voting and, on the other hand, to convince the reader
that a formal study of voting might be enlightening. This chapter is organised
as follows. In Section 1, we make the following basic assumption: each voters
preferences can accurately be represented by a ranking of all candidates from best
to worse, without ties. Then we show some problems occurring when aggregating
the rankings, using classical voting systems such as those applied in France or the
United Kingdom. We do this through the use of small and classical examples. In
Section 2, we consider other preference models than the linear ranking of Section
1. Some models are poorer in information but more realistic. Some are richer and
less realistic. In most cases, the aggregation remains a difficult task. In Section
3, we change the focus and try to examine voting in a much broader context.
Voting is not instantaneous. It is not just counting the votes and performing
some mathematical operation to find the winner. It is a process that begins when
somebody decides that a vote should occur (or even earlier) and ends when the
winner begins his mandate (or even later). In Section 4, we discuss the analogy
with multiple criteria decision support. The chapter ends with a conclusion.
methodthe process used to extract the best candidate or a ranking of the can-
didates from the result of the election. In many cases, the election is uninominal,
i.e. each voter votes for one candidate only
Then a (resp. b and c) obtains 10 votes (resp. 6 and 5). Thus a is chosen.
Nevertheless, this might be different from what a majority of voters wanted. In-
deed, an absolute majority of voters prefers any other candidate to a (11 out of
21 voters prefer b and c to a).
10 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS
Let us see, using the same example, if such a problem would be avoided by the
two-stage French system. After the first stage, as no candidate has an absolute
majority, a second stage is run between candidates a and b. We suppose that the
voters keep the same preferences on {a, b, c}. Thus a obtains 10 votes and b, 11
votes so that candidate b is elected. This time, none of the beaten candidates (a
and c) are preferred to b by a majority of voters. Nonetheless we cannot conclude
that the two-stage French system is superior to the British system from this point
of view, as shown by the following example.
reveal the preferences of the voters and that the campaign has the right effect on
the last two voters. Hence we observe the following preferences.
Condorcet winner, although almost half of the voters consider him to be the worse
candidate. Consider also example 10 taken from Fishburn (1977).
k
1 2 3 4 5 6 7 8 9
x 0 30 0 21 0 31 0 0 19
y 50 0 30 0 21 0 0 0 0
Table 2.1: Number of voters who rank the candidate in k-th place in their prefer-
ences
Note that there can be several such candidates. In these cases, the Borda
method does not tell us which one to choose. They are considered as equivalent.
But the likelihood of indifference is rather small and decreases as the number of
candidates or voters increases. For example, for 3 candidates and 2 voters, the
probability of all candidates being tied is 1/3; for 3 candidates and 50 voters, it is
less than 1 %. Note that once again, we supposed that all rankings have the same
probability.
Note that the Borda method not only allows to choose one candidate but to
rank them (by increasing Borda scores). If two candidates have the same Borda
score, then they are indifferent.
It turns out that b has the lowest Borda score. However, none of the two
voters changed their opinion about the pair {a, b}. The first (resp. second) voter
prefers a (resp. b) in both cases. Only the relative position of c changed and
this was enough to turn b into a winner and a into a looser. This can be seen
as a shortcoming of the Borda method. One says that the Borda method does
not satisfy the independence of irrelevant alternatives. It can be shown that the
Condorcet method satisfies this property.
...
Arrows theorem
Arrow (1963) was interested by the aggregation of rankings with ties into a ranking,
possibly with ties. We will call this ranking the overall ranking. He examined the
methods verifying the following properties.
Universal domain. This property implies that the aggregation method must be
applicable to all cases. Whatever the rankings provided by the voters, the
method must yield an overall ranking of the candidates. This property rules
out methods that would impose some restrictions on the preferences of the
voters.
Unanimity. If all voters are unanimous about a pair of candidates, e.g. if all voters
rank a before b, then a must be ranked before b in the overall preference.
This seems quite reasonable but example 9 showed that some commonly used
2.1. ANALYSIS OF SOME VOTING SYSTEMS 17
Independence. The relative position of two candidates in the overall ranking de-
pends only on their relative positions in the individuals preferences. There-
fore other alternatives are considered as irrelevant with respect to that pair.
Note that we observed in example 12 that the Borda method violates the
independence property. This property is often called Independence of irrel-
evant alternatives.
Theorem 2.1 (Arrow) When the number of candidates is at least 3, there ex-
ists no aggregation method satisfying simultaneously the properties of universal
domain, transitivity, unanimity, independence and non-dictatorship.
Gibbard-Satterthwaites theorem
Gibbard (Gibbard 1973) and Satterthwaite (Satterthwaite 1975) were very inter-
ested by the (non-)manipulability of aggregation methods, especially those leading
to the election of a unique candidate. Informally, a method is non-manipulable if,
in no case, a voter can improve the result of the election by not reporting his true
preferences. They proved the following result.
Many other impossibility results can be found in the literature. But this is not
the place to review them. Besides impossibility results, many characterisations are
available. A characterisation of a given aggregation method is a set of properties
simultaneously satisfied by only that method. These results help to understand
the fundamental principles of a method and to compare different methods.
At the beginning of this chapter, we decided to focus on elections of a unique
candidate. Some voting systems lead to the election of several candidates and
aim towards achieving a kind of proportional representation. One might think
that those systems are the solution to our problems. In fact, they are not. Those
systems raise as many questions (perhaps more) as the ones we considered (Balinski
and Young 1982). Furthermore, suppose that a parliament has been elected, using
proportional representation. This parliament will have to vote on many different
issues and, very often, only one candidate or law or project will have to be chosen.
2.2.1 Rankings
To model the preferences of a voter, we can use a ranking without ties. This model
corresponds to the assumption of Section 1. This implies that when you present a
pair of candidates (a, b) to a voter, he is always able to tell if he prefers a to b or
the converse. Furthermore, if he prefers a to b and b to c, he necessarily prefers a
to c (transitivity of preference).
c
a d
b
Figure 2.1: A complete pre-order. Arrows implied by transitivity are not repre-
sented
2. b is preferred to a,
3. a is indifferent to b or
Do we need semiorders only when a voter cannot distinguish between two very
similar objects ? The following example, adapted from (Armstrong 1939) will give
the answer. Suppose that you ask your child to choose between two presents for
his birthday: a poney and a blue bicycle. As he likes both of them equally, he will
say he is indifferent. Suppose now that you present him a third candidate: a red
bicycle with a small bell. He will probably tell you that he prefers the red one to
the blue one. So, you prefer the red bicycle to the poney, is that right ? you
would say if you consider a transitive indifference. However, it is obvious that the
child can still be indifferent between the poney and the red bicycle.
2.2. MODELLING THE PREFERENCES OF A VOTER 21
poney
red bike
blue bike
He does not have full knowledge about the candidates. For example, in a
legislative election, a voter does not necessarily know what the position of
all candidates is regarding a particular issue.
He does have full knowledge about the candidates but not about some events
that might occur in the future and affect the way he compares the candi-
dates. For example, again in a legislative election, a voter might ideally know
everything about all candidates. But he does not know if, during the forth-
coming mandate, the representatives will have to vote on a particular issue.
If such a vote is to occur, a voter might prefer candidate a to candidate b.
22 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS
In the other case, he might prefer b to a because there is just one thing that
he disapproves of the policy of b: his position about that particular issue.
He does not fully know his preferences. Suppose that the community in
which you live has decided to build a new recreational facility. There are
two options: a tennis court or a playground. You have to vote. You perfectly
know the two options (budget, time to completion, plan, . . . ). You like tennis
and your children would love that playground. You will have access to both
facilities under the same conditions. Can you tell which one you will choose ?
What will you enjoy more ? To play tennis or to let your children play in
the playground ?
These three cases can be seen as three facets of a single problem. The voter is
uncertain about the final consequences of his choice.
Fuzzy relations can be used to model such preferences. The voter must still
answer the above mentioned question (do you prefer a to b ?), but by numbers,
no longer by yes or no. If he feels that a is preferred to b is definitely true, he
answers 1. If he feels that a is preferred to b is definitely false, he answers 0. For
intermediate situations, he chooses intermediate numbers. For example, perhaps
could be 0.5 and almost yes, 0.9. A typical fuzzy relation on three candidates is
illustrated by Fig. 2.3 where a number on the arrow between two candidates (e.g.
a and b) is the answer of the voter to the question is a preferred to b.
0.6 b
0.4
0.3
a 0.8
0.0
1.0 c
Note that in many cases, uncertainty and vagueness are probably simultane-
ously present. For a thorough review of fuzzy preference modelling, see (Perny
and Roubens 1998).
rules ? All these questions received different answers in different countries and
committees. This may indicate that they are far from trivial.
Let us now be more pragmatic. The board of directors of a company asks the
executive committee to prepare a report on the future investment strategies. A
vote on the proposed strategies will be held during the next board of directors
meeting. How should the executive committee prepare its report ? Should they
include all strategies, even infeasible ones ? If infeasible ones are to be avoided,
who should decide that they are infeasible. To find all feasible strategies might
be prohibitively resource and time consuming. And one can never be sure that
all feasible strategies have been explored. There is no systematic way, no formal
method to do that. Creativity and imagination are needed during this process.
Finally, suppose that the executive committee decides to explore only some
strategies. A more or less arbitrary selection needs to be made. Even if they do
make this selection in a perfectly honest way, it can have far reaching consequences
on the outcome of the process. Remember example 11 in which we showed that,
for some aggregation methods, the relative ranking of two candidates depends on
the presence (or absence) of some other candidates. Furthermore, some studies
show that an individual can prefer a to b or b to a depending on the presence or
absence of some other candidate (Sen 1997).
Even the choice of the aggregation method can be considered as part of the voting
process for, in some cases, the aggregation method is at least as important as the
result of the vote. Consider two countries, A and B: A is ruled by a dictator,
B is a democracy. Suppose that each time a policy is chosen by voting in B,
the dictator of A applies the same policy in his country, without voting. Hence,
all governmental decisions are the same in A and B. The only difference is that
the people in A do not vote; their benevolent dictator decides alone. In what
country would you prefer to live ? I guess you would choose B, unless you are
the dictator. And you would probably choose B even if the decisions taken in
B were a little bit worse than the decisions taken in A. What we value in B is
freedom of choice. Some references or more details on this topic can be found in
(Sen 1997, Suzumura 1999).
2.4. SOCIAL CHOICE AND MULTIPLE CRITERIA DECISION SUPPORT 25
view of the situation, to put labels on different entities, to look for relationships
between entities, etc. Finally he obtains a problem , as one can find in books.
It is a description, in formal language or not, of the current situation. It usually
contains a description of the reasons for which that situation is not satisfying and
it contains an implicit description of the potential solutions to the problem. That
is, the problem statement contains information that allows to recognise if a given
action or course of actions is a potential solution or not. The problem statement
must not be too broad, otherwise anything can be a solution and the decision-
maker is not helped. On the contrary, if the statement is too narrow, some actions
are not recognised as potential solutions even if they would be good ones.
Some authors, mainly in the United Kingdom, have developed methods to help
decision-makers to better structure their problem (Rosenhead 1989, Daellenbach
1994).
When the problem has been stated, the decision-maker has a problem, but no
solution. He must construct the set of alternatives, like the candidates set in social
choice. Brainstorming and other techniques promoting and stimulating creativity
have been developed to support this step.
The criteria, like the voters, are not given in a decision process. The decision-
maker needs to identify all the viewpoints that are relevant with respect to his
problem. He then must define a set of criteria that reflect all relevant viewpoints
and that fulfills some conditions. There must not be several criteria reflecting
the same viewpoint. All criteria should be independent except if the aggregation
method to be used thereafter allows dependence between criteria. Depending on
the aggregation method, the scales corresponding to the criteria must have some
properties. And so on. See e.g. Roy (1996) and Keeney and Raiffa (1976).
Last but not least, the aggregation method itself must be chosen by the analyst
and/or the decision-maker. It is hard to imagine how an aggregation procedure
could be scientifically proven to be the best one. The decision-maker must thus
make a choice. He should choose the one that satisfies some properties he judges
important, the one he can understand, the one he trusts.
2.5 Conclusions
In this chapter, we have shown that the operation of voting is far from simple. In
the first section, using small examples, describing very simple situations, we found
that intuition and common sense are not sufficient to avoid the many traps that
await us when using aggregation procedures. In fact, in this domain, common
sense is of very little help. We also presented two theoretical results indicating
that there is no hope of finding a perfect voting procedure. Therefore, if we still
want to use a voting procedurethis seems hardly avoidablewe must accept to use
an imperfect one. But this does not mean that we can use any procedure in any
circumstance and any way. The flaws of a particular procedure are probably less
damageable in some instances than in others. Some features of a voting procedure
may be highly desirable in a given context while not so important in another one.
So, for each voting context, we have to choose the procedure that best matches our
2.5. CONCLUSIONS 27
needs. And, when we have made this choice, we must be aware that this match is
not perfect, that we must use the procedure in such a way that the risk of facing
a problematic situation is kept as low as possible.
In Section 2, we found that even the input of voting proceduresthe preferences
of the votersare not simple things. Many different models for preferences exist
and can be used in aggregation procedures. This shows that what is usually
considered as data is not really data. When we feed our aggregation procedures
with preferences, these are not given. They are constructed in some more or less
arbitrary way. The choice of a particular model (ranking with ties, fuzzy relations,
. . . ) is itself arbitrary. Nothing in the problem tells us what model to use.
Finally, in Section 3, we showed that the voting process itself is highly complex.
Voting procedures are decision models, just like student grades, indicators,
cost-benefit analysis, multiple criteria decision support (this has already been dis-
cussed in Section 4), . . . They are decision models devoted to the special case where
a decision must be taken by a group of voters and are mainly concerned with the
case of a finite and small set of alternatives. This peculiarity doesnt make voting
procedures very different from other decision and evaluation models. As you will
see in the following chapters, most decision models suffer the same kind of problems
that we have met in this chapter: there is no perfect aggregation procedure; the
data are not data, they are imperfect and arbitrary models; the decision models
are too narrow, they do not take into account the fact that decision support occurs
in a human process (the decision making process) and in a complex environment.
3
BUILDING AND AGGREGATING
EVALUATIONS: THE EXAMPLE OF
GRADING STUDENTS
3.1 Introduction
3.1.1 Motivation
In chapter 2, we tried to show that voting, although being a familiar activity
to almost everyone, raises many important and difficult questions that are closely
connected to the subject of this book. Our main objective in this chapter is
similar. We all share the more or less pleasant experience of having received
grades in order to evaluate our academic performances. The authors of this
book spend part of their time evaluating the performance of students through
grading several kinds of work, an activity that you may also be familiar with. The
purpose of this chapter is to build upon this shared experience. This will allow us
to discuss, based on simple and familiar situations, what is meant by evaluating
a performance and aggregating evaluations, both activities being central to
most evaluation and decision models. Although the entire chapter is based on the
example of grading students, it should be stressed that grades are often used
in contexts unrelated to the evaluation of the performance of students: employees
are often graded by their employers, products are routinely tested and graded by
consumer organisations, experts are used to rate the feasibility or the riskiness of
projects, etc. The findings of this chapter are therefore not limited to the realm
of a classroom.
As with voting systems, there is much variance across countries in the way
education is organised. Curricula, grading scales, rules for aggregating grades
and granting degrees, are seldom similar from place to place (for information on
the systems used in the European Union see www.eurydice.org).
This diversity is even increased by the fact that each instructor (a word that
we shall use to mean the person in charge of evaluating students) has generally
developed his own policy and habits. The authors of this book have studied in four
different European countries (Belgium, France, Greece and Italy) and obtained
degrees in different disciplines (Maths, Operational Research, Computer Science,
Geology, Management, Physics) and in different Universities. We were not overly
astonished to discover that the rules that governed the way our performances were
assessed were quite different. We were perhaps more surprised to realise that
29
30 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
2. All grades do not have a similar function. Whereas usually the final grade
of a course in Universities mainly has a certification role, intermediate
grades, on which the final grade may be partly based, have a more complex
role that is often both certificative and formative, e.g. the result of a
mid-term exam is included in the final grade but is also meant to be a signal
to a student indicating his strengths and weaknesses.
1. The scale that is used for grading students is usually imposed by the pro-
gramme. Numerical scales are often used in Continental Europe with varying
bounds and orientations: 0-20 (in France or Belgium), 0-30 (in Italy), 6-1 (in
Germany and parts of Switzerland), 0-100 (in some Universities). American
and Asian institutions often use a letter scale, e.g. E to A or F to A. Obvi-
ously we would not want to conclude from this that Italian instructors have
come to develop much more sensitive instruments for evaluating performance
than German ones or that the evaluation process is in general more precise
in Europe than it is in the USA. Most of us would agree that the choice of
a particular scale is mainly conventional. It should however be noted that
since grades are often aggregated at some point, such choices might not be
totally without consequences. We shall come back to that point in section
3.3.
2. Some courses are evaluated on the basis of a single exam. But there are
many possible types of exams. They may be written or oral; they may be
open-book or closed-book. Their duration may vary (45 minute exams are
not uncommon in some countries whereas they may last up to 8 hours in
some French programmes). Their content for similar courses may vary from
multiple choice questions to exercises, case-studies or essays.
4. Some instructors use raw grades. For reasons to be explained later, others
modify the raw grades in some way before aggregating and/or releasing
them, e.g. standardising them.
Do all the questions clearly relate to one or several of the announced objec-
tives of the course? Will it allow to discriminate between students? Is there
a good balance between modelling and computational skills? What should
the respective parts of closed vs. open questions be?
2. Preparing a marking scale. The preparation of the marking scale for a given
subject is also of utmost importance. A nice-looking subject might be
impractical in view of the associated marking scale. Will the marking scale
include a bonus for work showing good communication skills and/or will
misspellings be penalised? How to deal with computational errors? How
to deal with computational errors that lead to inconsistent results? How to
deal with computational errors influencing the answers to several questions?
How to judge an LP model in which the decision variables are incompletely
defined? How to judge a model that is only partially correct? How to judge a
model which is inconsistent from the point of view of units? Although much
expertise and/or rules of thumb are involved in the preparation of a good
subject and its associated marking scale, we are aware of no instructor not
having had to revise his judgement after correcting some work and realising
his severity and/or to correct work again after discovering some frequently
given half-correct answers that were unanticipated in the marking scale.
reliable, i.e. give similar results when applied several times in similar
conditions,
valid, i.e. should measure what was intended to be measured and only
that.
Extensive research in Education Science has found that the process of giving
grades to students is seldom perfect in these respects (a basic reference re-
mains the classic book of Pieron (1963). Airaisian (1991) and Merle (1996)
are good surveys of recent findings). We briefly recall here some of the
difficulties that were uncovered.
The crudest reliability test that can be envisaged is to give similar works to
correct to several instructors and to record whether or not these works are
graded similarly. Such experiments were conducted extensively in various
disciplines and at various levels. Not overly surprisingly, most experiments
have shown that even in the more technical disciplines (Maths, Physics,
Grammar) in which it is possible to devise rather detailed marking scales
34 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
are aware that it is probably the part that is read first and most attentively by all
students. Besides useful considerations on ethics, this section usually describes
the process that will lead to the attribution of the grades for the course in detail.
On top of describing the type of work that will be graded, the nature of exams
and the way the various grades will contribute to the determination of the final
grade, it usually also contains many details that may prove important in order
to understand and interpret grades. Among these details, let us mention:
the type of preparation and correction of the exams: who will prepare the
subject of the exam (the instructor or an outside evaluator)? Will the work
be corrected once or more than once (in some Universities all exams are
corrected twice)? Will the names of the students be kept secret?
the possibility of revising a grade: are there formal procedures allowing
the students to have their grades reconsidered? Do the students have the
possibility of asking for an additional correction? Do the students have
the possibility of taking the same course at several moments in the academic
year? What are the rules for students who cannot take the exam (e.g. because
they are sick)?
the policy towards cheating and other dishonest behaviour (exclusion from
the programme, attribution of the lowest possible grade for the course, at-
tribution of the lowest possible grade for the exam).
the policy towards late assignments (no late assignment will be graded, minus
x points per hour or day).
Weights We mentioned that the final grade for a course was often the combina-
tion of several grades obtained throughout the course: mid-term exam, final exam,
case-studies, dissertation, etc. The usual way to proceed is to give a (numerical)
36 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
weight to each of the work entering into the final grade and to compute a weighted
average, more important works receiving higher weights. Although this process is
simple and almost universally used, it raises some difficulties that we shall examine
in section 3.3. Let us simply mention here that the interpretation of weights in
such a formula is not obvious. Most instructors would tend to compensate for a
very difficult mid term exam (weight 30%) preparing a comparatively easier final
exam (weight 70%). However, if the final exam is so easy that most students
obtain very good grades, the differences in the final grades will be attributable
almost exclusively to the mid term exam although it has a much lower weight
than the final exam. The same is true if the final grade combines an exam with
a dissertation. Since the variance of the grades is likely to be much lower for the
dissertation than for the exam, the former may only marginally contribute towards
explaining differences in final grades independently of the weighting scheme. In
order to avoid such difficulties, some instructors standardise grades before averag-
ing them. Although this might be desirable in some situations, it is clear that the
more or less arbitrary choice of a particular measure of dispersion (why use the
standard deviation and not the inter quartile range? should we exclude outliers?)
may have a crucial influence on the final grades. Furthermore, the manipulation
of such distorted grades seriously complicates the positioning of students with
respect to a minimal passing grade since their use amounts to abandoning any
idea of absolute evaluation in the grades.
Passing a course In some institutions, you may either pass or fail a course
and the grades obtained in several courses are not averaged. An essential problem
for the instructor is then to determine which students are above the minimal
passing grade. When the final grade is based on a single exam we have seen
that it is not easy to build a marking scale. It is even more difficult to conceive a
marking scale in connection to what is usually the minimal passing grade according
to the culture of the institution. The question boils down to deciding what amount
of the programme should a student master in order to obtain a passing grade, given
that an exam only gives partial information about the amount of knowledge of the
student.
The problem is clearly even more difficult when the final grade results from the
aggregation of several grades. The use of weighted averages may give undesirable
results since, for example, an excellent group case-study may compensate for a
very poor exam. Similarly weighted averages do not take the progression of the
student during the course into account.
It should be noted that the problem of positioning students with respect to a
minimal passing grade is more or less identical to positioning them with respect
to any other special grades, e.g. the minimal grade for being able to obtain a
distinction, to be cited on the Deans honour list or the Academic Honour
Roll.
3.2. GRADING STUDENTS IN A GIVEN COURSE 37
out on which type of scales they have been measured. Let us notice that this
is true even in Physics. Saying that Mr. X weighs twice as much as Mr. Y makes
sense because this assertion is true whether mass is measured in pounds or in
kilograms. Saying that the average temperature in city A is twice as high as the
average temperature in city B may be true but makes little sense since the truth
value of this assertion clearly depends on whether temperature is measured using
the Celsius or the Fahrenheit scale.
The highest point on the scale An important feature of all grading scales is
that they are bounded above. It should be clear that the numerical value attributed
to the highest point on the scale is somewhat arbitrary and conventional. No loss
of information would be incurred using a 0-100 or a 0-10 scale instead of a 0-20
one. At best it seems that grades should be considered as expressed on a ratio
scale, i.e. a scale in which the unit of measurement is arbitrary (such scales are
frequent in Physics, e.g. length can be measured in meters or inches without loss
of information).
If grades can be considered as measured on a ratio scale, it should be recognised
that this ratio scale is somewhat awkward because it is bounded above. Unless you
admit that knowledge is bounded or, more realistically, that perfectly fulfilling
the objectives of a course makes clear sense, problems might appear at the upper
bound of the scale. Consider two excellent, but not necessarily equally excellent,
students. They cannot obtain more than the perfect grade 20/20. Equality of
grades at the top of the scale (or near the top, depending on grading habits) does
not necessarily imply equality in performance (after a marking scale is devised it is
not exceptional that we would like to give some students more than the maximal
grade, i.e. because some bonus is added for particularly clever answers, whereas
the computer system of most Universities would definitely reject such grades !).
The lowest point on the scale It should be clear that the numerical value that
is attributed to the lowest point of the scale is no less arbitrary and conventional
than was the case for the highest point. There is nothing easier than to transform
grades expressed on a 0-20 scale to grades expressed on a 100-120 scale and this
involves no loss of information. Hence it would seem that a 0-20 scale might
be better viewed as an interval scale, i.e. a scale in which both the origin and
the unit of measurement are arbitrary (think of temperature scale in Celsius or
Fahrenheit). An interval scale allows comparisons of differences in performance;
it makes sense to assert that the difference between 0 and 10 is similar to the
difference between 10 and 20 or that the difference between 8 and 10 is twice as
large as the difference between 10 and 11, since changing the unit and origin of
measurement clearly preserves such comparisons.
Let us notice that using a scale that is bounded below is also problematic. In
some institutions the lowest grade is reserved for students who did not take the
exam. Clearly this does not imply that these students are equally ignorant.
Even when the lowest grade can be obtained by students having taken the exam,
some ambiguity remains. Knowing nothing, i.e. having completely failed to meet
any of the objectives of the course, is difficult to define and is certainly contingent
3.2. GRADING STUDENTS IN A GIVEN COURSE 39
upon the level of the course (this is all the more true that in many institutions
the lowest grade is also granted to students having cheated during the exam,
with obviously no guarantee that they are equally ignorant). To a large extent
knowing nothing in the context of a course is somewhat as arbitrary as is
knowing everything. Therefore, if grades are expressed on interval scales, care
should be taken when manipulating grades close to the bounds of the scale.
Once more grades appear as complex objects. While they seem to mainly
convey ordinal information (with the possibility of the existence of non significant
small differences) that is typical of a relative evaluation model, the existence of
special grades complicates the situation in introducing some absolute elements
of evaluation in the model (on the measurement-theoretic interpretation of grades
see French 1981, Vassiloglou 1984).
Some readers, and most notably instructors, may have the impression that we
have been overly pessimistic on the quality of the grading process. We would
like to mention that the literature in Education Science is even more pessimistic
leading some authors to question the very necessity of using grades (see Sager 1994,
Tchudi 1997). We suggest to sceptical instructors the following simple experiment.
Having prepared an exam, ask some of your colleagues to take it with the following
instructions: prepare what you would think to be an exam that would just be
acceptable for passing, prepare an exam that would clearly deserve distinction,
prepare an exam that is well below the passing grade. Then apply your marking
scale to these papers prepared by your colleagues. It would be extremely likely
that the resulting grades show some surprises!
However, none of us would be prepared to abandon grades, at least for the
type of programmes in which we teach. The difficulties that we mentioned would
be quite problematic if grades were considered as measures of performance that
we would tend to make more and more precise and objective. We tend to
consider grades as an evaluation model trying to capture aspects of something
that is subject to considerable indetermination, the performance of students.
As is the case with most evaluation models, their use greatly contributes to
transforming the reality that we would like to measure. Students cannot
be expected to react passively to a grading policy; they will undoubtedly adapt
their work and learning practice to what they perceive to be its severity and
consequences. Instructors are likely to use a grading policy that will depend
on their perception of the policy of the Faculty (on these points, see Sabot and
Wakeman 1991, Stratton, Myers and King 1994). The resulting scale of measure-
ment is unsurprisingly awkward. Furthermore, as with most evaluation models
of this type, aggregating these evaluations will raise even more problems.
This not to say that grades cannot be a useful evaluation model. If these lines
have lead some students to consider that grades are useless, we suggest they try
to build up an evaluation model that would not use grades without, of course,
relying too much on arbitrary judgements. This might not be an impossible task;
we, however, do not find it very easy.
3.3. AGGREGATING GRADES 41
Conjunctive rules
In programmes of this type, students must pass all courses, i.e. obtain a grade
above a minimal passing grade in all courses in order to obtain the degree. If
they fail to do so after a given period of time, they do not obtain the degree.
This very simple rule has the immense advantage of avoiding any amalgamation
of grades. It is however seldom used as such because:
it does not allow to discriminate between grades just below the passing grade
and grades well below it,
it offers no incentive to obtain grades well above the minimal passing grade,
Most instructors and students generally violently oppose such simple systems since
they generate high failure rates and do not promote academic excellence.
42 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
Weighted averages
In many programmes, the grades of students are aggregated using a simple weighted
average. This average grade (the so-called GPA in American Universities) is then
compared to some standards e.g. the minimal average grade for obtaining the de-
gree, the minimal average grade for obtaining the degree with a distinction, the
minimal average grade for being allowed to stay in the programme, etc. Whereas
conjunctive rules do not allow for any kind of compensation between the grades
obtained for several courses, all sorts of compensation effects are at work with a
weighted average.
used for the gi (a). The simplest decision rule consists in comparing g(a) with
some standards in order to decide on the attribution of the degree and on possible
distinctions. A number of examples will allow us to understand the meaning of
this rule better and to emphasise its strengths and weaknesses (we shall suppose
throughout this section that students have all been evaluated on the same courses;
for the problems that arise when this is not so, see Vassiloglou (1984)).
Example 1
Consider four students enrolled in a degree consisting of two courses. For each
course, a final grade between 0 and 20 is allocated. The results are as follows:
g1 g2
a 5 19
b 20 4
c 11 11
d 4 6
Student c has performed reasonably well in all courses whereas d has a consis-
tent very poor performance; both a and b are excellent in one course while having
a serious problem in the other. Casual introspection suggests that if the students
were to be ranked, c should certainly be ranked first and d should be ranked last.
Students a and b should be ranked in between, their relative position depending
on the relative importance of the two courses. Their very low performance in 50%
of the courses does not make them good candidates for the degree. The use of
simple weighted average of grades leads to very different results. Considering that
both courses are of equal importance gives the following average grades:
average grades
a 12
b 12
c 11
d 5
which leads to having both a and b ranked before c. As shown in figure 3.1, we can
say even more: there is no vector of weights (w, 1w) that would rank c before both
a and b. Ranking c before a implies that 11w + 11(1 w) > 5w + 19(1 w) which
8
leads to w > 15 . Ranking c before b implies 11w + 11(1 w) > 20w + 4(1 w), i.e.
7
w < 16 (figure 3.1 should make clear that there is no loss of generality in supposing
that weights sum to 1). The use of a simple weighted sum is therefore not in line
with the idea of promoting students performing reasonably well in all courses.
The exclusive reliance on a weighted average might therefore be an incentive for
students to concentrate their efforts on a limited number of courses and benefit
44 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
20
a l
18
16
14
12
c l
10
6 d l
4 l
b
2
0
0 2 4 6 8 10 12 14 16 18 20
from the compensation effects at work with such a rule. This is a consequence of
the additivity hypothesis embodied in the use of weighted averages.
It should finally be noticed that the addition of a minimal acceptable grade
for all courses can decrease but not suppress (unless the minimal acceptable grade
is so high that it turns the system in a nearly conjunctive one) the occurrence of
such effects.
A related consequence of the additivity hypothesis is that it forbids to account
for interaction between grades as shown in the following example.
Example 2
Consider four students enrolled in an undergraduate programme consisting in three
courses: Physics, Maths and Economics. For each course, a final grade between 0
and 20 is allocated. The results are as follows:
On the basis of these evaluations, it is felt that a should be ranked before b. Al-
though a has a low grade in Economics, he has reasonably good grades in both
3.3. AGGREGATING GRADES 45
Maths and Physics which makes him a good candidate for an Engineering pro-
gramme; b is weak in Maths and it seems difficult to recommend him for any
programme with a strong formal component (Engineering or Economics). Using
a similar type of reasoning, d appears to be a fair candidate for a programme in
Economics. Student c has two low grades and it seems difficult to recommend him
for a programme in Engineering or in Economics. Therefore d is ranked before c.
Although these preferences appear reasonable, they are not compatible with
the use of a weighted average in order to aggregate the three grades. It is easy to
observe that:
ranking a before b implies putting more weight on Maths than on Economics
(18w1 + 12w2 + 6w3 > 18w1 + 7w2 + 11w3 w2 > w3 ),
ranking d before c implies putting more weight on Economics than on Maths
(5w1 + 17w2 + 8w3 > 5w1 + 12w2 + 13w3 w3 > w2 ),
which is contradictory.
In this example it seems that criteria interact. Whereas Maths do not over-
weigh any other course (see the ranking of d vis-a-vis c), having good grades in
both Math and Physics or in both Maths and Economics is better than having
good grades in both Physics and Economics. Such interactions, although not
unfrequent, cannot be dealt with using weighted averages; this is another conse-
quence of the additivity hypothesis. Taking such interactions into account calls
for the use of more complex aggregation models (see Grabisch 1996).
Example 3
Consider two students enrolled in a degree consisting of two courses. For each
course a final grade between 0 and 20 is allocated; both courses have the same
weight and the required minimal average grade for the degree is 10. The results
are as follows:
g1 g2
a 11 10
b 12 9
It is clear that both students will receive an identical average grade of 10.5: the
difference between 11 and 12 on the first course exactly compensates for the oppo-
site difference on the second course. Both students will obtain the degree having
performed equally well.
It is not unreasonable to suppose that since the minimal required average for
the degree is 10, this grade will play the role of a special grade for the instructors,
a grade above 10 indicating that a student has satisfactorily met the objectives
of the course. If 10 is a special grade then, it might be reasonable to consider
that the difference between 10 and 9 which crosses a special grade is much more
significant than the difference between 12 and 11 (it might even be argued that the
small difference between 12 and 11 is not significant at all). If this is the case, we
46 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
would have good grounds to question the fact that a and b are equally good. The
linearity hypothesis embodied in the use of weighted averages has the inevitable
consequence that a difference of one point has a similar meaning wherever on the
scale and therefore does not allow for such considerations.
Example 4
Consider a programme similar to the one envisaged in the previous example. We
have the following results for three students:
g1 g2
a 14 16
b 15 15
c 16 14
All students have an average grade of 15 and they will all receive the degree.
Furthermore, if the degree comes with the indication of a rank or of an average
grade, these three students will not be distinguished: their equal average grade
makes them indifferent. This appears desirable since these three students have
very similar profiles of grades.
The use of linearity and additivity implies that if a difference of one point on
the first grade compensates for an opposite difference on the other grade, then a
difference of x points on the first grade will compensate for an opposite difference
of x points on the other grade, whatever the value of x. However, if x is chosen
to be large enough this may appear dubious since it could lead, for instance, to
view the following three students as perfectly equivalent with an average grade of
15:
g1 g2
a0 10 20
b 15 15
c0 20 10
whereas we already argued that, in such a case, b could well be judged preferable to
both a0 and c0 even though b is indifferent to a and c. This is another consequence
of the linearity hypothesis embodied in the use of weighted averages.
Example 5
Consider three students enrolled in a degree consisting of three courses. For each
course a final grade between 0 and 20 is allocated. All courses have identical
importance and the minimal passing grade is 10 on average. The results are as
follows:
3.3. AGGREGATING GRADES 47
g1 g2 g3
a 12 5 13
b 13 12 5
c 5 13 12
It is clear that all students have an average equal to the minimal passing grade
10. They all end up tied and should all be awarded the degree.
As argued in section 3.2 it might not be unreasonable to consider that final
grades are only recorded on an ordinal scale, i.e. only reflect the relative rank of
the students in the class, with the possible exception of a few special grades
such as the minimal passing grade. This means that the following table might as
well reflect the results of these three students:
g1 g2 g3
a 11 4 12
b 13 13 6
c 4 14 11
since the ranking of students within each course has remained unchanged as well
as the position of grades vis-a-vis the minimal passing grade. In this case, only
b (say the Deans nephew) gets an average above 10 and both a and c fail (with
respective averages of 9 and 9.6). Note that using different transformations, we
could have favoured any of the three students.
Not surprisingly, this example shows that a weighted average makes use of the
cardinal properties of the grades. This is hardly compatible with grades that
would only be indicators of ranks even with some added information (a view that
is very compatible with the discussion in section 3.2). As shown by the following
example, it does not seem that the use of letter grades, instead of numerical
ones, helps much in this respect.
Example 6
In many American Universities the Grade Point Average (GPA), which is nothing
more than a weighted average of grades, is crucial for the attribution of degrees and
the selection of students. Since courses are evaluated on letter scales, the GPA
is usually computed by associating a number to each letter grade. A common
conversion scheme is the following:
A 4 (outstanding or excellent)
B 3 (very good)
C 2 (good)
D 1 (satisfactory)
E 0 (failure)
48 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
g1 g2 g3
a 90 69 70
b 79 79 89
c 100 70 69
A 90100%
B 8089%
C 7079%
D 6069%
E 059%
g1 g2 g3
a A D C
b C C B
c A C D
Supposing the three courses of equal importance and using the conversion scheme
of letter grades into numbers given above, the calculation of the GPA is as follows:
3.3. AGGREGATING GRADES 49
g1 g2 g3 GPA
a 4 1 2 2.33
b 2 2 3 2.33
c 4 2 1 2.33
A+ 98100%
A 9497%
A 9093%
B+ 8789%
B 8386%,
B 8082%
C+ 7779%,
C 7376%,
C 7072%,
D 6069%,
F 059%
g1 g2 g3
a A D C
b C+ C+ B+
c A+ C D
A+ 10
A 9
A 8
B+ 7
B 6
B 5
C+ 4
C 3
C 2
D 1
F 0
50 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
g1 g2 g3 GPA
a 8 1 2 3.66
b 4 4 7 5.00
c 10 2 1 4.33
In this case, b (again the Deans nephew) gets a clear advantage over a and c.
It should be clear that standardisation of the original numerical grades before
conversion offers no clear solution to the problem uncovered.
Example 7
We argued in section 3.2 that small differences in grades might not be significant
at all provided they do not involve crossing any special grade. The explicit
treatment of such imprecision is problematic using a weighted average; most often,
it is simply ignored. Consider the following example in which three students are
enrolled in a degree consisting of three courses. For each course a final grade
between 0 and 20 is allocated. All courses have the same weight and the minimal
passing grade is 10 on average. The results are as follows:
g1 g2 g3
a 13 12 11
b 11 13 12
c 14 10 12
All students will receive an average grade of 12 and will all be judged indifferent.
If all instructors agree that a difference of one point in their grades (away from 10)
should not be considered as significant, student a has good grounds to complain.
He can argue that he should be ranked before b: he has a significantly higher grade
than b on g1 while there is no significant difference between the other two grades.
The situation is the same vis-a-vis c: a has a significantly higher grade on g2 and
this is the only significant difference.
In a similar vein, using the same hypotheses, the following table appears even
more problematic:
g1 g2 g3
a 13 12 11
b 11 13 12
c 12 11 13
since, while all students clearly obtain a similar average grade, a is significantly
better than b (he has a significantly higher grade on g1 while there are no signifi-
cant differences on the other two grades), b is significantly better than c and c is
3.4. CONCLUSIONS 51
significantly better than a (the reader will have noticed that this is a variant of
the Condorcet paradox mentioned in chapter 2).
Aggregation rules using weighted sums will be dealt with again in chapters 4
and 6. In view of these few examples, we hope to have convinced the reader that
although the weighted sum is a very simple and almost universally accepted rule,
its use may be problematic for aggregating grades. Since grades are a complex
evaluation model, this is not overly surprising. If it is admitted that there is no
easy way to evaluate the performance of a student in a given course, there is no
reason why there should be an obvious one for an entire programme. In particular,
the necessity and feasibility of using rules that completely rank order all students
might well be questioned.
3.4 Conclusions
We all have been accustomed to seeing our academic performances in courses
evaluated through grades and to seeing these grades amalgamated in one way or
another in order to judge our overall performance. Most of us routinely grade
various kinds of work, prepare exams, write syllabi specifying a grading policy,
etc. Although they are very familiar, we have tried to show that these activities
may not be as simple and as unproblematic as they appear to be. In particular,
we discussed the many elements that may obscure the interpretation of grades
and argued that the common weighted sum rule to amalgamate them may not be
without difficulties. We expect such difficulties to be present in the other types of
evaluation models that will be studied in this book.
We would like to emphasise a few simple ideas to be drawn from this example
that we should keep in mind when working on different evaluation models:
evaluation operations are complex and should not be confused with mea-
surement operations in Physics. When they result in numbers, the proper-
ties of these numbers should be examined with care; using numbers may
be only a matter of convenience and does not imply that any operation can
be meaningfully performed on these numbers.
the aggregation of the result of several evaluation models should take the
nature of these models into account. The information to be aggregated may
itself be the result of more or less complex aggregation operations (e.g. ag-
gregating the grades obtained at the mid-term and the final exams) and may
be affected by imprecision, uncertainty and/or inaccurate determination.
aggregation models should be analysed with care. Even the simplest and
most familiar ones may in some cases lead to surprising and undesirable
conclusions.
52 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS
Finally we hope that this brief study of the evaluation procedures of students will
also be the occasion for instructors to reflect on their current grading practices.
This has surely been the case for the authors.
4
CONSTRUCTING MEASURES: THE
EXAMPLE OF INDICATORS
Our daily life is filled with indicators: I.Q., Dow Jones, GNP, air quality, physicians
per capita, poverty index, social position index, consumer price index, rate of
return, . . . If you read a newspaper, you could feel that these magic numbers rule
the world.
Note that in many cases, the decisions of the World Bank to withdraw help
are not motivated by economic or financial reasons. Violations of human rights
are often presented as the main factor. But it is worth noting that indicators of
human rights also exist (see e.g. Horn (1993)).
Why are these indicators (often called indices) so powerful ? Probably because
it is commonly accepted that they faithfully reflect reality. This forces us to raise
several questions.
1. Is there one reality, several realities or no reality ? Many philosophers nowa-
days consider that reality is not unique. Each person has a particular per-
ception of the world and, hence, a particular reality. One could argue that
these particular realities are just particular views of the same reality but,
as it is impossible to consider reality independently of our perception of it,
it might be meaningless to consider that reality exists per se (Roy 1990).
As a consequence, an indicator might only be relevant for the person who
constructed it.
2. Whatever the answer to the previous question, can we hope that an indicator
faithfully reflects reality (the reality or a reality) ? Reality is so complex that
this is doubtful. Therefore, we must accept that an indicator accounts only
for some aspects of reality. Hence, an indicator must be designed so as
53
54 CHAPTER 4. CONSTRUCTING MEASURES
to reflect those aspects that are relevant with respect to our concerns. As
an illustration, the Human Development index (HDI) defined by the United
Nations Development Programme (UNDP) to measure development (United
Nations Development Programme 1997) is used by many different people in
different continents and in different areas of activity (politicians, economists,
businessmen, . . . ). Can we assume that their concerns are similar ?
In the Human development report 1997, UNDP proudly reports that
This clearly shows that many people used the HDI in completely different
ways.
Furthermore, are the concerns of UNDP itself with respect to the HDI clearly
defined ? Why do they need the human development index ? To cut subsidies
to nations evolving in the wrong direction ? To share subsidies among the
poorest countries (according to what key) ? To put some pressure on the
governments performing the worst ? To prove that Western democracies
have the best political systems ?
3. Suppose that the purpose of an indicator is clearly defined. Are we sure that
this indicator indicates what we want it to ? Do the arithmetic operations
performed during the computation of the indicator lead to something that
makes sense ?
Let us now discuss three well known indicators arising in completely different
areas of our lives in detail: the human development index, the air quality index
and the decathlon score.
HDIs precise definition is presented on page 122 of the 1997 Human Develop-
ment Report. The HDI is a simple average of the life expectancy index, educational
attainment index and adjusted real GDP per capita (PPP$) index. Here is how
each index is computed.
Life Expectancy Index (LEI) This index measures life expectancy at birth. In
order to normalise the scale of this index, a minimum value (25 years) and
a maximum one (85 years) have been defined. The index is defined as
life expectancy at birth 25
.
85 25
Hence, it is a value between 0 and 1.
Educational Attainment Index (EAI) It is a combination of two other indi-
cators: the Adult Literacy Index (ALI) and the combined primary, secondary
and tertiary Enrollment Ratio Index (ERI). The first one is the proportion
of literate adults while the second one is the proportion of children in age of
primary, secondary or tertiary school that really go to school. The EAI is a
weighted average of ALI and ERI; it is equal to
2ALI + ERI
.
3
Adjusted real GDP per capita (PPP$) Index (GDPI) This index aims at
measuring the income per capita. As the value of one dollar for someone
earning $100 is much larger than the value of one dollar for someone earning
$100 000, the income is first transformed using Atkinsons formula (Atkinson
1970). The transformed value of y, i.e. W (y), is given by one of the following:
if 0 < y < y ,
y
1/2
if y y < 2y ,
y + 2[(y y ) ]
y + 2(y )1/2 + 3[(y 2y )1/3 ] if 2y y < 3y ,
..
.
y + 2(y )1/2 + 3(y )1/3 + . . . if (n 1)y y < ny
+n[(y (n 1)y )1/n ]
.
In this formula, y represents the income, W (y) the transformed income and
y is set at $5 835 (PPP$) which was the World average annual income per
capita in 1994.
Thereafter, the income scale is normalised, using the maximum value of
$40 000, the minimum value of $100 and the formula
transformed income W (100)
.
W (40 000) W (100)
Hence, it is a value between 0 and 1. Note that W (40 000) = 6 154 and
W (100) = 100.
56 CHAPTER 4. CONSTRUCTING MEASURES
Some words about the data and their collection time: the Human Development
Report is a yearly publication (since 1990). Obviously, the 1997 report does not
contain the 1997 data. Indeed, the HDI computed in the 97 report is considered
by the UNDP as the HDI of 1994. To make things more complicated, the 199i HDI
(in the 199j report) is an aggregate of data from 199i (for some dimensions) and
from earlier years (for other dimensions). In this volume, we use only data from
the 1997 Human Development Report. We refer to them as HDR97, irrespective
of the collection year.
To illustrate how the HDI works, lets compute the HDI for Greece (HDR97).
Life expectancy in Greece is 77.8 years. Hence, LEI = (77.825)/(8525) = 0.880.
The ALI is 0.967 and the ERI is 0.820. Hence, EAI = (2 0.967 + 0.820)/3 =
0.918. Greeces real GDP per capita at $11 265 is above y by less than twice
y . Thus the adjusted real GDP per capita for Greece is $5 982 (PPP$) because
5 982 = 5 835+2(11 2655 835)1/2 . Hence GDPI = (5 982W (100))/(W (40 000)
W (100)) = (5 982 100)/(6 154 100) = 0.972. Finally, Greeces HDI is (0.880 +
0.918 + 0.972)/3 = 0.923.
Table 4.1: Bounds: life expectancy, EAI and GDPI for South Korea and Costa
Rica (HDR97)
are set to 85 and 25, then the HDI is 0.890 for South Korea and 0.889 for Costa
Rica. But if the maximum and minimum for life expectancy are set to 80 and 25,
then the HDI is 0.915 for South Korea and 0.916 for Costa Rica. In the first case,
Costa Rica is less developed than South Korea while in the second one, we obtain
the converse: Costa Rica is more developed than South Korea. Hence, the choice
of the bounds matters.
4.1. THE HUMAN DEVELOPMENT INDEX 57
In fact narrowing the range of life expectancy from [25,85] to [25,80] increases
the difference between any two values of LEI by a factor (8525)/(8025). Hence
it amounts to increasing the weight of LEI by the same factor. In our example,
Costa Rica performed better than South Korea on life expectancy. Therefore, it
is not surprising that its position is improved when life expectancy is given more
weight (by narrowing its range).
Note that, apparently, no bounds were fixed for the ALI and the ERI. In reality,
this is equivalent to choosing 1 for maximum and 0 for minimum. This is also an
arbitrary choice. It is obvious that values 0 and 1 have not been observed and are
not likely to be observed in a foreseeable future. Hence the range of these scales
is narrower than [0,1] and the scale could be normalised, using other values than
0 and 1.
4.1.2 Compensation
Consider Table 4.2 where the data for two countries (Gabon and the Solomon
Islands, HDR97) are presented. The Solomon Islands perform quite well on all
dimensions; Gabon is slightly better than the Solomon Islands on all dimensions
except life expectancy where it is very bad. For us, this very short life expectancy
is clearly a sign of severe underdevelopment, even if other dimensions are good.
Nevertheless, the HDI is equal to 0.56 for both Gabon and Solomon Islands. Hence,
Other substitution rates are easy to compute: e.g. the substitution rate between
life expectancy and adult literacy is 0.016667(1 0)(3/2)=0.025. To compensate
a decrease of n years of life expectancy, you need an increase of the adult literacy
index of n times 0.025.
Let us now think in terms of real GDP (not adjusted). In a country where
real GDP is 13 071$ (Cyprus, HDR97), a decrease in life expectancy of one year
can be compensated by an increase in real GDP of 21 084$. In a country where
real GDP is 700$ (Chad, HDR97), a decrease of life expectancy by one year can
be compensated by an increase in real GDP by 100.9$. Hence, poor peoples life
expectancy has much less value than that of rich ones.
life expectancy, y is much lower than x on adult literacy but much higher than
x on income. As life expectancy is very short, one might consider that adult
literacy is not very important (because there are almost no adults) but income is
more important because it improves quality of life in other respects. Furthermore,
health conditions and life expectancy can be expected to improve rapidly due to a
higher income. Hence, one could conclude that y is more developed than x. Our
conclusion is confirmed by the HDI: 0.30 for x and 0.34 for y.
Let us now compare two countries, w and z similar to x and y except that life
expectancy is equal to 70 for both w and z (see Table 4.4). In such conditions, the
performance of z on adult literacy is really bad compared to that of w. The adult
population is very important and its illiteracy is a severe problem. Even if the high
income of z is used to foster education, it will take decades before a significant
part of the population is literate. On the contrary, ws low income doesnt seem to
be a problem for the quality of life, as life expectancy is high as well as education.
Hence, it might not be unreasonable to conclude that w is more developed than
z. But if we compute the HDI, we obtain 0.52 for w and 0.56 for z! This should
not be a surprise; there is no difference between x and y on the one hand and w
and z on the other hand, except for life expectancy. But the differences in life
4.1. THE HUMAN DEVELOPMENT INDEX 59
expectancy between x and w and between y and z are equal. Hence, this results
in the same increase of the HDI (compared to x and y) for both w and z.
When a sum (or an average) is used to aggregate different dimensions, identical
performances of by two items (countries or whatever) on one or more dimensions
are not relevant for the comparison of these items. The identical performances can
be changed in any direction; as long as they remain identical, they do not affect
the way both items compare to each other. This is called dimension independence;
it is inherent to sums and averages. But we saw that this property is not always
desirable. When we compare countries on the basis of life expectancy, education
and income, dimension independence might not be desirable.
contrary, even if reality exists, the average of the length of life doesnt correspond
to something real. It is the length of life of a kind of average or ideal human, as
if we (the real humans) were imperfect, irregular or noisy copies of that average
human. Until the 19th- century, both kinds of averages were called by differ-
ent names (moyenne proportionnelledifferent measures of one objectand valeur
communedifferent objects, each measured once) and considered as completely dif-
ferent. During the 19th-century the Belgian astronomer and statistician Quetelet
(1796-1894) invented the concept of the average human and unified both averages
(Desrosieres 1995).
To convince you that the concept of the average human is quite strange (though
possibly useful), consider a country where all inhabitants are right triangles of
different sizes and shapes (example borrowed from Warusfel (1961)). To make it
easy, let us suppose that there are just two kinds of right triangles (see Fig. 4.1),
in the same proportion. A statistician wants to measure the average right triangle.
In order to do so, he computes the average length of each edge. What he gets is
a triangle with edges of length 4, 8 and 9, i.e. a triangle which is not right-angled
for 42 + 82 6= 92 . The average right triangle is no longer a right triangle! What
looks like a right angle is in fact approximately a 91 degrees angle. In the same
spirit, Quetelet measured the average size of humans, in all dimensions, including
the liver, heart, spleen and other organs. What he got was an average human in
which it was impossible to fit all its average organs. They were too large!
13
5 9
5 4
3
4 12 8
The adult literacy index is quite different: it is just the number of literate
adults, divided by the total adult population to allow comparisons between coun-
tries. Hence one could think it is not an average. In fact it depends on how we
interpret it. If we consider that an ALI of 0.60 means that 60% of the population
is literate, then it is not an average. If we consider that an ALI of 0.60 means that
the average literacy level is 60%, then it is an average. And this last interpreta-
tion is not more silly than computing a life expectancy index. Consider a variable
whose value is 0 for an illiterate adult and 1 for a literate one. Compute the av-
erage of this variable over the population and over some a time period. What do
you get ? The adult literacy index!
We can analyse the enrolment ratio index and the adjusted real GDP index in
the same way as the ALI. They are quantities that are measured at country level.
The first one being a proportion and the second one being normalised, they can
also be interpreted at individual level, like averages.
What about the HDI itself. According to the United Nations Development
Programme (1997), it is designed to
4.2. AIR QUALITY INDEX 61
Furthermore, the HDI contains an index (LEI) which can only be interpreted bear-
ing in mind Quetelets average human. Therefore the ALI, GDPI and HDI should
be interpreted in this way as well. The HDI somehow describes how developed the
average human in a country is.
ATMO index is the largest value, that is 8. Hence the air quality is very bad. In
the following paragraphs, we discuss some problems arising with the ATMO index.
4.2.1 Monotonicity
Suppose that, due to heavy traffic, the absence of wind and a very sunny day, the
ozone sub-index increases from 3 to 8 for the air described in Table 4.5. Clearly,
this corresponds to a worse air: no pollutant did decreased, one of them increased.
In these conditions, we expect the ATMO index to worsen as well. In fact the
62 CHAPTER 4. CONSTRUCTING MEASURES
ATMO index does not change. The maximum is still 8. Thus some changes, even
significant ones, are not reflected by the index. In our example, the change is very
significant as the ozone sub-index was almost perfect and became very bad.
Note that if the ozone sub-index decreases from 8 to 3, the ATMO index does
not change either though the air quality improves. This shows that the ATMO
index is not monotonic. Some changes, in both directions, are not reflected by the
index.
the EU long term norm for ozone. Air y is not good for any dimensions. It is of
average quality on all dimensions and close to the EU long term norms for three
dimensions. The ATMO index is 6 for air x and 5 for air y. Hence, the quality of
air x is considered to be lower than that of air y. Contrary to what we observed
with the HDI, no compensation at all occurs between the different dimensions.
The small weakness of x (6 compared to 5, for ozone) is not compensated by its
large strengths (1 compared to 4 or 5, for carbon dioxide, nitrogen dioxide and
dust). In the case of human development, the compensation between dimensions
was too strong. Here, we face another extreme: no compensation at all, which is
probably not better.
4.2.3 Meaningfulness
Let us forget our criticism of the ATMO index and suppose that it works well.
Consider the statement Todays ATMO index (6) is twice as high as yesterdays
index (3). What does it mean ? We are going to show that it is meaningless, in
a certain sense. Let us come back to the definition of the sub-indices. For a given
pollutant, the concentration is measured in g/m3 . The concentration figures are
then transformed into numbers between 1 and 10. This is done in an arbitrary
way. For example, instead of choosing 5-6 for the EU long term norms and 8 for
the short term ones, 6-7 and 9 could have been chosen. The index would work
as well. The relevant information provided by the index is not the figure itself; it
is some information about the fact that we are above or below some norms that
are related to the effects of the pollutants on health (a somewhat similar situation
has been encountered in Chapter 3). But in such a case, the values of todays
and yesterdays index would be different, say 7 and 4, and 7 is not twice as large
as 4. To conclude, the statement Todays ATMO index (6) is twice as high as
4.3. THE DECATHLON SCORE 63
points
points
distance distance
Figure 4.2: Decathlon tables for distances: general shape of a convex (left) and
concave (right) tables
the performances of the athletes. Suppose that an athlete arrived 0.1 second before
the next athlete in the 100-meter run. They have ranks i and i+1. So the difference
in the scores that they receive is 1. Suppose now that the delay between these
two athletes is 1 second. Their ranks are unchanged. Thus the difference of in
the scores that they receive is still 1 though a larger difference would be more
appropriate. That is why other tables of single-event scores have been used since
1908 (de Jongh 1992, Zarnowsky 1989). In the tables used after 1908, high scores
are associated to good performances (contrary to scores before 1908). Hence, the
winner is the athlete that has the highest overall score.
Some of these tables (different versions, in use between 1934 and 1962) are
based on the idea that improving a performance by some amount (e.g. 5 centime-
tres in a long jump) is more difficult if the performance is close to the world record.
Hence, it deserves more points. The general shape of these tables, for distances,
is given in Figure 4.2 (convex table). For times (in runs), the shape is different as
an improvement is a decrease in time.
A problem raised by convex tables is the following: if an athlete decides to
focus on some events (for example the four kinds of runs) and to do much more
training for them than for the other ones, he will have an advantage. He will come
closer to the world record for runs and earn many points. At the same time, he
will be further away from the world record for the other disciplines but that will
make him lose less points as the slope of the curve is more gentle in that direction.
The balance will be positive. Thus these tables encourage athletes to focus on
some disciplines, which is contrary to the spirit of the decathlon.
That is why, since 1962, different concave tables (see Figure 4.2) have been used.
These tables strongly encourage the athletes to be excellent in all disciplines. An
example of a real table, in use in 1998, is presented in Figure 4.3. Note that a new
change occurred: this table is no longer concave. It is almost linear but slightly
convex.
There are many interesting points to discuss about the decathlon score.
How are the minimum and maximum values set ? They can highly influ-
ence the score as it was shown with the HDI (in Section 4.1.1). Obviously,
the maximum value must somehow be related to the world record. But as
4.3. THE DECATHLON SCORE 65
1200
1100
1000
900
score
800
700
600
500
400
Figure 4.3: A plot for the 100 meters run score table in 1998
everyone knows, world records are objects that athletes like to break.
Why adding single-event scores ? Other operations might work as well. For
example, multiplication may favour the athletes that perform equally well in
all disciplines. To illustrate this point very simply, consider a 3-event contest
where single-event scores are between 0 and 10. An athlete, say x obtains 8
in all three events. Another one, y obtains 9, 8 and 7. If we add the scores,
x and y obtain the same score: 24. If we multiply the scores, x gets 512
while y looses with 504.
...
The point on which we will focus, in this decathlon example, is the role of the
indicator.
close to that of the winner and is thus a good athlete. Another one is far from the
winner and is consequently not a good one athlete.
Not much later (after the second competition), a third role appeared. How did
the athletes evolve ? This athlete has improved his score or x has a better score in
this contest than the score of y in the previous contest. This kind of comparison
is not meaningful: suppose that an athlete wins a contest with a score of 16. In
the next contest, he performs very poorly: short jumps, slow runs, short throws.
But his main opponents are absent or perform equally poorly. He might still win
the contest and even with a higher score although his performance is worse than
the previous time.
After some time, the organisers of decathlons became aware of the second and
third role. It was probably part of the motivations to abandon the sum of ranks
and to use convex tables. These tables, to some extent, made the comparisons of
scores across athletes and/or competitions meaningful. At the same time, the score
found a new role as a monitoring tool during the training. Before 1908, the scores
could be computed only during competitions as they were sums of ranks. And it
was not long before a wise coach used it as a strategic tool, advising his athlete to
focus on some events. For this reason, since 1962, the organisers conferred a new
role to the score: to foster excellence in all disciplines. This was achieved by the
introduction of concave tables. But it is most likely that the score is still used as
a strategic tool, hopefully in a less perverse way.
It is worth noting that this new role doesnt replace any of the previous ones.
The score aims at rewarding equal performances in all disciplines but it is also
used to assess the performance of an athlete. Even if we only consider only these
two roles (the other ones could be seen as side effects), it is amazing to see how
incompatible they are.
go in the same direction for each dimension taken separately. For example, for
each dimension of the ATMO index, everyone prefers a lower concentration. But
it is definitely not reasonable to assume that the global preferences are similar.
Furthermore, even if single-dimensional preferences go in the same direction, it
does not mean that single-dimensional preferences are identical. Those who are
not very sensitive to a pollutant will value a decrease in concentration much more
if it occurs at high concentration than at low concentration. On the contrary,
sensitive people might value concentration decreases at low and high levels equally.
or
Wenn ich mich lehn an deine Brust,
kommts uber mich wie Himmelslust;
doch wenn du sprichst: ich liebe dich!
so muss ich weinen bitterlich.2
4.5 Conclusions
Among evaluation and decision models, indicators are probably more widespread
than any other model (this is definitely true if you think of cost-benefit analysis or
1 Octavio Paz, Here, translated by Nims (1990)
2 Heinrich Heine, Ich liebe dich, translated by Louis Untermeyer(van Doren 1928)
And when I lean upon your breast / My soul is soothed with godlike rest; / But when you
swear: I love but thee! / Then I must weepand bitterly.
70 CHAPTER 4. CONSTRUCTING MEASURES
multiple criteria decision support). Student grades are also very popular, as well
almost every one has faced them at some point of his lifebut, besides the fact that
most people use and/or encounter them, indicators are pervasive in many domains
of human activity, contrary to student grades that are confined to education (note
that student grades could be considered as special cases of indicators).
Indicators are not often thought of as decision support models but, actually,
in many circumstances, are. Indicators are usually presented as an efficient way
to synthesise information. But what do we need information for ? For making
decisions !
In this chapter, we analyzed three different indicators: the human development
index, the ATMO (an air quality index) and the decathlon score.
On the one hand, all three indicators have been shown to present flaws: they
do not always reflect reality or what we consider as reality. This is due to an excess
or a lack of compensation, to non monotonicity, to an incapability of dealing with
dimension dependence, . . . These problems are not specific to indicators. Some of
them have already been discussed in Chapter 3 and/or will be met in Chapter 6.
On the other hand, we saw that an indicator does not necessarily need to reflect
reality or, at least, it does not need to reflect only reality.
5
ASSESSING COMPETING
PROJECTS: THE EXAMPLE OF
COST-BENEFIT ANALYSIS
5.1 Introduction
Decision-making inevitably implies, at some stage, the allocation of rare resources
to some alternatives rather than to others (e.g. deciding how to use ones income).
It is therefore not at all surprising that the question of helping a decision-maker
to choose between competing alternatives, projects, courses of action and/or to
evaluate them, has attracted the attention of economists. Cost-Benefit Analysis
(CBA) is a set of techniques that economists have developed for this purpose. It
is based on the following simple and apparently inescapable idea: a project should
only be undertaken when its benefits outweigh its costs.
CBA is particularly oriented towards the evaluation of public sector projects.
Decisions made by governments, public agencies and firms or international organ-
isations are complex and have a huge variety of consequences. Some examples of
areas in which CBA has been applied will give a hint of the type of projects that
are evaluated:
71
72 CHAPTER 5. ASSESSING COMPETING PROJECTS
These types of decision are immensely complex. They affect our everyday
life and are likely to affect that of our children. Most economists view CBA as
the standard way of evaluating such projects and of supporting public decision-
making (numerous examples of practical studies using CBA can easily be found in
applied economics journals, e.g. American Journal of Agricultural Economics, En-
ergy Economics, Environment and Planning, Journal of Environmental Economics
and Management, Journal of Health Economics, Journal of Policy Analysis and
Management, Journal of Public Finance and Public Choice, Journal of Transport
Economics and Policy, Land Economics, Pharmaco-Economics, Public Budget-
ing and Finance, Regional Science and Urban Economics, Water Resources Re-
search). Since fairly different approaches to these problems have been advocated,
it is important to have a clear idea of what CBA is; if the claim of economists
was to be perfectly well-founded there would be hardly any need for other deci-
sion/evaluation models.
Although it has distant origins (see Dupuit 1844), the development of CBA
has unsurprisingly coincided with the more active involvement of governments in
economic affairs that started after the great depression and climaxed after World
War II in the 50s and 60s. A good overview of the early history of CBA can
be found in Dasgupta and Pearce (1972). After having started in the USA in
the field of Water Resource Management (see Krutilla and Eckstein (1958) for
an overview of these pioneering developments), the principles of CBA were soon
adopted in other areas and countries, the UK being the first and more active one.
While research on (and applications of) CBA grew at a very fast rate during the
50s and 60s, the principles of CBA were entrenched in a series of very influential
manuals for project evaluation produced by several international organisations
(OECD: Little and Mirlees (1968), Little and Mirlees (1974), ONUDI: Dasgupta,
Marglin and Sen (1972) and, more recently, World Bank: Adler (1987), Asian De-
velopment Bank: Kohli (1993)). In many countries nowadays, the Law makes it
an obligation to evaluate projects using the principles of CBA. Research on CBA
is still active and economists have spent considerable time and energy in investi-
gating its foundations and refining the various tools that it requires in practical
applications (recent references include Boardman 1996, Brent 1996, Nas 1996).
It would be impossible to give a fair account of the immense literature on CBA
in a few pages. Although somewhat old, two excellent introductory references are
Dasgupta and Pearce (1972) and Lesourne (1975). Less ambitiously, we shall try
here to:
These three objectives structure the rest of this chapter into sections. Our
aim, while clearly not being to promote the use of CBA, is not to support the
nowadays-fashionable claim (especially among environmentalists) that CBA is an
outdated useless technique either. In pointing out what we believe to be some
5.2. THE PRINCIPLES OF CBA 73
limitations of CBA, we only want to give arguments refuting the claim of some
economists that, under all circumstances, it is the only consistent way to support
decision/evaluation processes (Boiteux 1994).
is to be received today while a(1) will only be received one time period ahead.
Therefore these two numbers, although expressed in the same unit, are not directly
comparable. There is a simple way however to summarise the components of the
evaluation vector using a single number.
Suppose that there is a capital market on which the firm is able to lend or
borrow money at a fixed interest rate of r per time period (this market is assumed
to be perfect: borrowing and lending will not affect r and are not restricted). If
you borrow 1 m.u. for one time period on this market today, you will have to spend
(1 + r) m.u. in period 1 in order to respect your contract. Similarly, if you know
1
that you will receive 1 m.u. in period 1, you can borrow an amount of 1+r m.u.:
your revenue of 1 m.u. in period 1 will allow you to reimburse exactly what you
1
have to i.e. 1+r (1 + r) = 1 m.u. Hence, being sure of receiving 1 m.u. in period
1
1 corresponds to receiving, here and now, an amount 1+r m.u. Using a similar
reasoning and taking into account compound interest, receiving 1 m.u. in period
1
i corresponds to an amount of (1+r) i m.u. now. This is what is called discounting
T T
X b(i) c(i)
X a(i)
(5.1) NPV = i
=
i=0
(1 + r) i=0
(1 + r)i
the duration was divided into conveniently chosen time periods of equal
length,
other possible constraints were ignored (e.g. projects may be exclusive, syn-
ergetic).
The literature in Finance is replete with extensions of this simple model that allow
to cope with less simplistic hypotheses.
in CBA costs and benefits are evaluated from the point of view of so-
ciety,
in CBA costs and benefits are not necessarily directly expressed in m.u.;
when this happens, conveniently chosen prices are used to convert them
into m.u.,
in CBA the discounting rate has to be chosen from the point of view of
society.
Retaining the spirit of the notations used above, the benefits b(i) and costs c(i)
of a project in period i are seen in CBA as vectors with respectively ` and `0
components:
where b(j, i) (resp. c(k, i)) denotes the social benefits (resp. the social costs)
on the jth dimension (resp. on the kth dimension), evaluated in units that are
specific to that dimension, generated by the project in period i.
In each period, costs and benefits are converted into m.u. using suitably
chosen prices. We denote by p(j) (resp p0 (k)) the price of one unit of social
benefit on the jth dimension (resp. one unit of the social cost on the kth dimension)
expressed in m.u. (for simplicity, and consistently with real-world applications,
prices are assumed to be independent from the time period). These prices are
used to summarise the vectors b(i) and c(i) into single numbers expressed in m.u.
letting:
P
b(i) = p(j)b(j, i)
j=1
and
0
`
p0 (k)c(k, i)
P
c(i) =
k=1
76 CHAPTER 5. ASSESSING COMPETING PROJECTS
where b(i) (resp. c(i)) denotes the social benefits (resp. costs) generated by the
project in period i converted into m.u.
After this conversion and having suitably chosen a social discounting rate r,
it is possible to apply the standard discounting formula for computing the Net
Present Social Value (N P SV ) of a project. We have:
0
`
p0 (k)c(k, i)
P P
T T p(j)b(j, i)
X b(i) c(i) X j=1 k=1
(5.2) N P SV = =
i=0
(1 + r)i i=0
(1 + r)i
how can one evaluate benefits and costs from a social point of view?
m X
X n
(5.3) dW = Wj Uji dqji
j=1 i=1
where
W U
Wj = U j
and Uji = qjij
Social welfare will increase following the shock if dW > 0.
The existence of markets for the various goods and the hypothesis that indi-
viduals operate on these markets so as to maximise utility ensure that, before the
shock, we have, for all individuals j and for all goods i and k:
Uji pi
(5.4) =
Ujk pk
where pi denotes the price of the ith good. Having chosen a particular good
for numeraire (we shall call that good money), this implies that:
(5.5) Uji = j pi
where j can be interpreted as the marginal effect on the utility of individual
j of a marginal variation of the consumption of the numeraire good, i.e. as the
marginal utility of income for individual j.
Using 5.5, 5.3 can be rewritten as:
m
X n
X
(5.6) dW = i Wj pi dqji
j=1 i=1
m X
X n
(5.7) dW = pi dqji
j=1 i=1
which amounts to saying that the social effects of the shock are measured as
the sum over individuals of the variation of their consumption evaluated at market
prices (i.e. the so-called consumer surplus). In this simple model, variations of
social welfare are therefore conveniently measured in money terms using market
prices.
Returning to CBA, the relation 5.7 coincides with the computation of the
N P SV when time is not an issue and the effects (costs or benefits) of a project
can be expressed in terms of consumption of goods exchanged on markets. The
general formula for computing the N P SV may be seen as an extension of 5.7
without these restrictions.
In spite of all its limitations, our model allows us to understand, through the
simple derivation of equation 5.7, the rationale for trying to price out all effects of
a project in order to assess its contribution to social welfare.
A detailed treatment of the foundations of CBA without our simplifying hy-
potheses can be found in Dreze and Stern (1987). Although we shall not enter
into details, it should be emphasised that the theoretical foundations of CBA are
controversial on some important points. The appropriateness of equation 5.7 and
of related formulas is particularly clear in situations that are fairly different from
the ones in which CBA is currently used as an evaluation tool. These are often
characterised by:
the presence of numerous public goods for which no market price is available
(think of health services or education),
effects that are highly complex and may concern a very long period of time
(think of a policy for storing used nuclear fuel),
effects that are very unevenly distributed among individuals and raise im-
portant equity concerns (think of your reaction if a new airport were to be
built close to your second residence in the middle of the countryside),
In spite of these difficulties, CBA still mainly rests on the use of the N P SV (or
some of its extensions) to evaluate projects. Economists have indeed developed an
incredible variety of tools in order to use the N P SV even in situations in which
it would a priori seem difficult to do so. It is impossible to review the immense
literature that these efforts have generated here. It includes: the determination of
prices for goods without markets, e.g. contingent valuation techniques or hedonic
prices (see Scotchmer 1985, Loomis, Peterson, Champ, Brown and Lucero 1998),
the determination of an appropriate social discounting rate (useful references on
this controversial topic include Harvey 1992, Harvey 1994, Harvey 1995, Keeler
and Cretin 1983, Weitzman 1994), the inclusion of equity considerations in the
calculation of the NPSV (Brent 1984), the treatment of uncertainty, the consid-
eration of irreversible effects (e.g. through the use of option values). An overview
of this literature may be found in Sugden and Wiliams (1983) and in Zerbe and
Dively (1994). We will simply illustrate some of these points in section 5.3.
is one minute equal to one minute? Such a question may not be as silly as
it seems. In most models time gains are evaluated on the basis of what is
called generalised time i.e. a measure of time that accounts for elements of
(dis)comfort of the journey (e.g. temperature, stairs to be climbed, a more
or less crowded environment). Although this seems reasonable, much less
efforts have been devoted to the study of models allowing to convert time
into generalised time than on the price of time that will be used afterwards,
5.3. SOME EXAMPLES IN TRANSPORTATION STUDIES 81
is one hour worth 60 times one minute? Most models evaluating and pricing
out time gains are strictly linear. This is dubious since some gains (e.g. 10
seconds per user-day) might well be considered insignificant. Furthermore,
the loss of one hour daily for some users may have a much greater impact
than 60 losses of 1 minute,
what is the value of time and how should time gains be converted into mon-
etary units? Should we take the fact that people have different salaries into
account? Should we rather use price based on stated preferences? Should
we take into account the fact that most surveys using stated preferences have
shown that the value of time highly depends on the motive of the journey
(being much lower for journeys not connected to work)?
The present practice in the Paris region is to linearly evaluate all (generalised)
time gains using the average hourly net salary in the Region (74 FRF/hour in 1994
or approximately 13 USD/hour or 13 e/hour). In view of the major uncertainties
surrounding traffic forecasts that are used to compute the time gains and the
arbitrariness of the price of time that is used, it does not seem unfair to consider
that such evaluations give, at best, interesting indications.
these figures being based on several stated preference studies (it is not without
interest to note that these figures were quite different before 1993, human life
being, at that time, valued at 1 866 000 FRF). Using these figures and combining
them with statistical information concerning the occurrence of car accidents and
their severity, leads to benefits in terms of security which amount to 0.08 FRF per
vehicle-km avoided in the Paris region.
Although this might not appear as a very pleasant subject of study, econo-
mists have developed many different methods for evaluating the value of human
life, including methods based on human capital, the value of life insurance con-
tracts, sums granted by courts following accidents, stated preference approaches,
revealed preference approaches including smoking and driving behaviour, wages
82 CHAPTER 5. ASSESSING COMPETING PROJECTS
for activities involving risk (Viscusi 1992). Besides raising serious ethical dif-
ficulties (Broome 1985), these studies exhibit incredible variations across tech-
niques and, seemingly similar, countries (this explains why in many medical stud-
ies, in which benefits mainly include lives saved, cost-effectiveness analysis
is often preferred to CBA since it does not require to price out human life (see
Johannesson 1995, Weinstein and Stason 1977). We reproduce below some sig-
nificant figures for the value of life used in several European countries (this table
is adapted from Syndicat des Transports Parisiens 1998); all figures are in 1993
European Currency Unit (ECU), one 1993 ECU being approximately one 1993
USD):
all other effects should be described verbally. Monetarised effects and non
monetarised ones should not be included in a common table that would
give the same statute and, implicitly, importance to all. A multiple criteria
presentation would furthermore attribute an unwarranted scientific value to
such tables,
In view of:
the conclusion that CBA remains the best method seems unwarranted. CBA
has often been criticised on purely ideological grounds, which seems ridiculous.
However the insistence on seeing CBA as a scientific, rational and objective
evaluation model, all words that are frequently spotted in texts on CBA (Boiteux
1994), seems no more convincing.
5.4 Conclusions
CBA is an important decision/evaluation method. We would like to note in par-
ticular that:
CBA emphasises the fact that decision and/or evaluation methods are not
context-free. Having emerged from economics, it is not surprising that mar-
kets and prices are viewed as the essential parts of the environment in CBA.
More generally, any decision/evaluation method that would claim to be
context-free would seem of limited interest to us,
84 CHAPTER 5. ASSESSING COMPETING PROJECTS
a decision/evaluation tool will be all the more useful that it lends itself
easily to an insertion into a decision process. Decision processes involving
public sector projects are usually extremely complex. They last for years
and involve many stakeholders generally having conflicting objectives. CBA
tries to summarise the effects of complex projects into a single number. The
complex calculations leading to the NPSV use a huge amount of data with
varying levels of credibility. Merging rather uncontroversial information (e.g.
the number of deaths per vehicle-km in a given area) with much more sensible
and debatable information (e.g. the price of human life) from the start might
not give many opportunities to stakeholders for reaching partial agreements
and/or for starting negotiations. This might also result in a model that might
not appear transparent enough to be really convincing (Nyborg 1998),
the additive linear structure of the, implicit, aggregation rule used in CBA
can be subjected to the familiar criticisms already mentioned in chapters 3
and 4. Probably all users of CBA would agree that an accident killing 10 000
people might result in a dramatic situation in which the costs incurred
have little relation with the costs of 10 000 accidents each resulting in one
loss of life (think of a serious nuclear accident compared to ordinary car
accidents). Similarly, they might be prepared to accept that there may exist
air pollution levels above which all mammal life on earth could be endangered
and that although these levels are multiples of those currently manipulated
in the evaluation of transportation projects, they may have to be priced out
quite differently. If there are limits to linearity, CBA offers almost no clue
as to where to place these limits. It would seem to be a heroic hypothesis to
suppose that such limits are simply never reached in practice,
the use of a simple social discounting rate as a surrogate for taking a clear
position on inter-generational equity issues is open to discussion. Even ac-
cepting the rather optimistic view of a continuous increase of welfare and of
technical innovation, taking decisions today that will have important conse-
quences in 1000 years (think of the storage of used nuclear fuel) while using a
method that gives almost no weight to what will happen 60 years from now
1
( 1.08 60 1%) seems debatable (see Harvey 1992, Harvey 1994, Weitzman
1994),
the very idea that social preferences exist is open to question. We showed
in chapter 2 that elections were not likely to give rise to such a concept.
It seems hard to think of other forms of social co-ordination that could do
much better. We doubt that markets are such particular institutions that
they always allow to solve or bypass the problem in an undebatable way. But
if social preferences are ill-defined, the meaning of the NPSV of a project
is far from being obvious. We would argue that it gives, at best, a partial
and highly conventional view of the desirability of the project,
decision/evaluation models can hardly lead to convincing conclusions if el-
ements of uncertainty and inaccurate determination entering the model are
not explicitly dealt with. This is especially true in the context of the eval-
uation of public sector projects. Practical texts on CBA always insist on
the need for sensitivity analysis before coming to conclusions and recom-
mendations. Due to the amount of data of varying quality included in the
computation of the NPSV, sensitivity analysis is often restricted to studying
the impact of the variation of a few parameters on the NPSV, one parameter
varying at a time. This is rather far from what we could expect in such situ-
ations; a true robustness analysis should combine simultaneous variations
of all parameters in a given domain,
These limitations should not be interpreted as implying a condemnation of
CBA. We consider them as arguments showing that, in spite of its many qual-
ities, CBA is far from exhausting the activity of supporting decision/evaluation
processes (Watson 1981). We are afraid to say that if you disagree on this point,
you might find the rest of this book of extremely limited interest. On the other
hand, if you expect to discover in the next chapters formal decision/evaluation
tools and methodologies that would solve all problems and avoid all difficulties
you should also realise that your chances of being disappointed are very high.
6
COMPARING ON THE BASIS OF
SEVERAL ATTRIBUTES: THE
EXAMPLE OF MULTIPLE CRITERIA
DECISION ANALYSIS
How to choose a car is probably the multiple criteria problem example that has
been most frequently used to illustrate the virtues and possible pitfalls of multiple
criteria decision aiding methods. The main advantage of this example is that the
problem is familiar to most of us (except for one of the authors of this book who is
definitely opposed to owning a car) and it is especially appealing for male decision-
makers and analysts for some psychological reason. However, one can object that
in many illustrations, the problem is too roughly stated to be meaningful; the
motivations, needs, desires and/or phantasms of the potential buyer of a new or
second-hand car can be so diversified that it will be very difficult to establish a list
of relevant points of view and build criteria on which everybody would agree; the
price for instance is a very delicate criterion since the amount of money the buyer
is ready to spend clearly depends on his social condition. The relative importance
of the criteria also very much depends on the personal characteristics of the buyer:
there are various ideal types of car buyers, for instance people who like sportive
car driving, or large comfortable cars or reliable cars or cars that are cheap to run.
One point should be made very clear: it is unlikely that a car could be universally
recognised as the best, even if one restricts oneself to a segment of the market;
this is a consequence of the existence of decision-makers with many different value
systems.
Despite these facts, we have chosen to use the Choosing a car example,
in a properly defined context, for illustrating the hypotheses underlying various
elementary methods for modelling and aggregating evaluations in a decision aiding
process. The case is simple enough to allow for a short but complete description;
it also offers sufficient potential for reasoning on quite general problems raised by
the treatment of multi-dimensional data in view of decision and evaluation. We
describe the context of the case below and will invoke it throughout this chapter
for illustrating a sample of decision aiding methods.
87
88 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
(Keeney and Raiffa (1976), Saaty (1980)). A thorough analysis of the properties
required of the family of criteria selected in any particular context (consistent
family, i.e. exhaustive, non-redundant and monotonic) can be found in Roy and
Bouyssou (1993) (see also Bouyssou (1990), for a survey).
We shall not emphasise the process of selecting viewpoints in this chapter,
although it is a matter of importance. It is sufficient to say that Thierrys concerns
are very particular and that he accordingly selected five viewpoints related to cost
(criterion 1), performance of the engine (criteria 2 and 3) and safety (criteria 4
and 5).
Evaluations of the cars on these viewpoints have been obtained from monthly
journals specialised in the benchmarking of cars. The official quotation of second
hand vehicles of various ages is also published in such journals.
estimate the actual consumption for everyday use); when provided by specialised
journalists in magazines, the procedures for measuring are generally unspecified
and might vary since the cars are not all evaluated by the same person.
The third criterion that Thierry took into consideration is linked with the
pick up or suppleness of the engine in urban traffic; this dimension is considered
important since Thierry also intends to use his car in normal traffic. The indicator
selected to measure this dimension (Pick up in Table 6.2) is the time (in seconds)
needed for covering one kilometre when starting in fifth gear at 40 km/h. Again
other indicators could have been chosen (e.g. the torque). This dimension is not
independent of the second criterion, since they are generally positively correlated
(powerful engines generally lead to quick response times on both criteria); cars
that are specially prepared for competition may however lack suppleness in low
operation conditions which is quite unpleasant in urban traffic. So, from the point
of view of the user, i.e. in terms of preferences, criteria 2 and 3 reflect different
requirements and are thus both necessary. For a short discussion about the notions
of independence and interaction, the reader is referred to Section 6.2.4.
In the magazines evaluation report, several other dimensions are investigated
such as comfort, brakes, road-holding behaviour, equipment, body, boot, finish,
maintenance, etc. For each of these, a number of aspects are considered: 10 for
comfort, 3 for brakes, 4 for road-holding, . . . . In view of Thierrys particular
motivations, only the qualities of braking and of road-holding are of concern to
him and lead to the building of criteria 4 and 5 (resp. Brakes and Road-h
in Table 6.2). The 3 or 4 partial aspects of each viewpoint are evaluated on an
ordinal scale the levels of which are labelled serious deficiency, below average,
average, above average, exceptional. To get an overall indicator of braking
quality (and also for road-holding), Thierry re-codes the ordinal levels with integers
6.1. THIERRYS CHOICE 91
from 0 to 4 and takes the arithmetic mean of the 3 or 4 numbers; this results in
the figures with 2 decimals provided in the last two columns of Table 1. Obviously
these numbers are also imprecise, not necessarily because of imprecision in the
evaluations but because of the arbitrary character of the cardinal re-coding of
the ordinal information and its aggregation via an arithmetic mean (postulating
implicitly that, in some sense, the 3 components of each viewpoint are equally
important and the levels of each of the three scales are equally spaced). We shall
however consider that these figures reflect, in some way, the behaviour of each car
from the corresponding viewpoint; it is clear however that not too much confidence
should be awarded to the precision of these evaluations.
Note that the first 3 criteria have to be minimised while the last 2 must be
maximised.
This completes the description of the data which, obviously, are not given
but selected and elaborated on the basis of the available information. Being in-
trinsically part of this data is an appreciation (more or less explicit) of their degree
of precision and their reliability.
Criteria to be minimised
6
Fiat
Alfa
Nissan
Mazda
MitsuColt
Toyota
Honda
Opel
Ford
R19
Peu16
Peu
MitsuGal Supple
R21 Accel
Cost
Criteria to be maximised
4
3.5
2.5
1.5
0.5
Fiat
Alfa
Nissan
Mazda
MitsuColt
Toyota
Honda
Opel
Ford
R19
Peu16
Peu
MitsuGal
R21 Roadh
Brakes
Figure 6.1: Performance diagram of all cars along the first three criteria (above;
to be minimised) and the last two (below; to be maximised)
6.1. THIERRYS CHOICE 93
and
criterion 5 2
with at least one strict inequality.
Looking at the performances of the remaining cars, those labelled 1, 2, 10 are
further discarded. The set of remaining cars is restated for instance by the rule:
criterion 2 < 30
Finally, the car labelled 14 is eliminated since it is dominated by car number 11.
Dominated by car 11 means that car 11 is at least as good on all criteria and
better on at least one criterion (here all of them!). Notice that car number 14
would not have been dominated if other criteria had been taken into consideration
such as comfort or size: this car is indeed bigger and more classy than the other
cars in the sample.
The cars left after the above elimination process are those labelled 3,7,11,12;
their performances are shown on Figure 6.2. In these star-diagrams each car is
represented by a pentagon; their values on each criterion have all been linearly
re-scaled, being mapped on the [1, 3] interval. The choice of interval [1, 3] instead
of interval [0, 2] is dictated by the mode of representation: the value 0 plays a
special role since it is common to all axes; if an alternative was to receive a 0 value
on several criteria, those evaluations would all be represented by the origin, which
makes the graph less readable. On each axis, the value 1 corresponds to the lowest
value for one of the cars in the initial set of 14 alternatives on each criterion; the
value 3 corresponds to the highest value for one of the 14 cars. In interpreting the
diagrams, remember that criteria 1, 2 and 3 are to be minimised while the others
have to be maximised.
Thierry did not use the latter diagram (Figure 6.2); he drew the same diagram
as in Figure 6.1 instead after reordering the cars; the 4 candidate cars were all
put on the right of the diagram as shown in Figure 6.3; in this way Thierry was
still able to compare the difference in the performances of two candidate cars for a
criterion to typical differences for that criterion in the initial sample. This suggests
that the evaluations of the selected cars should not be transformed independently
of the values of the cars in the initial set; these still constitute reference points in
relation to which the selected cars are evaluated. On Figure 6.4, for the readers
convenience, we show a close-up of Figure 6.3 that is focused on the 4 selected
cars only.
Thierry first eliminates car number 12 on the basis of its relative weakness
on the second criterion (acceleration). Among the 3 remaining cars the one he
chooses is number 11. Here are the reasons for this decision.
1. Comparing cars 3 and 11, Thierry considers that the price difference (about
500 e ) is worth the gain (.7 second) on the acceleration criterion.
2. Comparing cars 7 and 11, he considers that the cost difference (car 7 about
1 500 e more expensive) is not balanced by the small advantage on accelera-
tion (.3 second) coupled with a definite disadvantage (.8 second) on supple-
ness.
94 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
crit 1 (cost)
crit 1 (cost) 3,00
3,00
2,00
2,00
1,00 crit 2 (accel)
crit 5 (road-h) 1,00 crit 5 (road-h)
crit 2 ( accel) 0,00
0,00
crit 3 (supple)
crit 4 (brakes) crit 3 (supple) crit 4 (brakes)
Figure 6.2: Star graph of the performances of the 4 cars left after the elimination
process
10
Fiat (1)
Alfa (2)
Mazda (4)
Mitsu Colt (5)
Toyota (6)
Opel (8)
Ford (9)
R19 (10)
Mitsu Gal (13)
R21 (14)
Nissan (3)
Honda (7) Roadh (Max)
Brakes (Max)
Peu16 (11)
Pick up (min)
Peu (12) Accel (min)
Cost (min)
Figure 6.3: Performance diagram of all cars; the 4 candidate cars stand on the
right
96 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
10
Nissan (3)
Figure 6.4: Detail of Figure 6.3: the 4 cars remaining after initial screening
Comments
Thierrys reasoning process can be analysed as being composed of two steps. The
first one is a screening process in which a number of alternatives are discarded
on the basis of the fact that they do not reach aspiration levels on some criteria.
Notice that these levels have not been set a priori as minimal levels of satis-
faction; they have been set after having examined the whole set of alternatives, to
a value that could be described as both desirable and accessible. The rules that
have been used for eliminating certain alternatives have exclusively been combined
in conjunctive mode since an alternative is discarded as soon as it does not fulfil
any of the rules.
More sophisticated modes of combinations may be envisaged, for instance mix-
ing up conjunctive and disjunctive modes with aspiration levels defined for sub-
sets of criteria (see Fishburn (1978) and Roy and Bouyssou (1993), pp. 264-266).
Another elementary method that has been used is the elimination of dominated
alternatives (car 11 dominates car 14).
In the second step of Thierrys reasoning,
1. Criteria 4 and 5 were not invoked; there are several possible reasons for this:
criteria 4 and 5 might be of minor importance or considered satisfactory
once a certain level is reached; they could be insufficiently discriminating for
the considered subset of cars (this is certainly the case for criterion 4): the
values of the differences for the set of candidate cars could be such that they
are not large enough to balance the differences on other criteria.
3. The reasoning is not made on the basis of re-coded values like those used
in the graphics; more intuition is needed, which is better supported by the
original scales. Since criteria 4 and 5 are aggregates and, thus, are not
expressed in directly interpretable units, this might also have been a reason
for not exploiting them in the final selection.
Suppose, without loss of generality, that all criteria are to be maximised, i.e. the
larger the value gi (a), the better the alternative a on criterion i (if, on the contrary,
gi were to be minimised, substitute gi by gi or use a negative weight ki ). Once the
weights ki have been determined, choosing an alternative becomes straightforward:
the best alternative is the one associated with the largest values of f . Similarly,
a ranking of the alternatives is obtained by ordering them in decreasing order of
the value of f .
This simple and most commonly used procedure relies however on very strong
hypotheses that can seldom be considered plausibly satisfied. These problems
appear very clearly when trying to use the weighted sum approach on the car
example.
gi (a)
(6.2) gi0 (a) =
gi,max
gi (a) gi,min
(6.3) gi00 (a) =
gi,max gi,min
For simplicity, we suppose here that gi are positive. In the former case the maximal
value of gi0 will be 1 while value 0 is kept fixed which means that the ratio of the
evaluations of any pair a, b of alternatives remains unaltered:
This transformation can be advanced when using ratio scales, in which the value
0 plays a special role. Statements such as alternative a is twice as good as b on
criterion i remain valid after transformation.
In the case of gi , the top evaluation will be mapped onto 1 while the bottom
one goes onto 0; ratios are not preserved but ratios in differences of evaluations
6.2. THE WEIGHTED SUM 99
Such a transformation is appropriate for interval scales; it does not alter the
validity of statements like the difference between a and b on criterion i is twice
the difference between c and d.
Note that the above are not the only possible options for transforming the data;
note also that these transformations depend on the set of alternatives: considering
the 14 cars of the initial sample or the 4 cars retained after the first elimination
would yield substantially different results since the values gi,min and gi,max depend
on the set of alternatives.
Weights ki Value
1 2 1 0.5 0.5 f
Nr Name of cars Cost Accel Pick Brak Road
3 Nissan Sunny 0.80 0.94 0.84 1.00 0.77 -2.63
11 Peugeot 16V 0.82 0.92 0.84 0.88 0.85 -2.64
12 Peugeot 0.75 0.96 0.85 0.88 0.85 -2.66
10 Renault 19 0.80 0.97 0.91 0.88 1.00 -2.71
7 Honda Civic 0.89 0.91 0.86 0.88 0.62 -2.82
1 Fiat Tipo 0.86 1.00 0.89 0.88 0.92 -2.85
5 Mitsu Colt 0.71 0.96 0.86 0.62 0.54 -2.91
2 Alfa 33 0.72 0.98 1.00 0.75 0.77 -2.92
8 Opel Astra 0.86 0.94 0.85 0.62 0.62 -2.96
6 Toyota 0.65 1.00 0.88 0.50 0.62 -2.97
4 Mazda 323 0.72 0.99 0.86 0.62 0.46 -3.02
9 Ford Escort 0.93 0.95 0.83 0.75 0.54 -3.03
14 Renault 21 1.00 0.94 0.88 0.75 0.69 -3.04
13 Mitsu Galant 0.81 0.98 0.89 0.62 0.38 -3.15
Weights ki Value
-1 -2 -1 0.5 0.5 f
Nr Name of car Cost Accel Pick Brak Road
11 Peugeot 16V 0.92 0.96 0.98 0.88 1.00 -2.876
3 Nissan Sunny 0.89 0.98 0.98 1.00 0.91 -2.890
12 Peugeot 0.84 1.00 0.99 0.88 1.00 -2.896
7 Honda Civic 1.00 0.95 1.00 0.88 0.73 -3.090
set of alternatives, for instance the worst acceptable value (minimal requirement
for a performance to be maximised; maximal level of a variable to be minimised,
a cost, for instance) on each criterion; with such an option, the source of the lack
of stability would be the imprecision in the determination of the worst acceptable
value. Notice that the above problem has already been discussed in Chapter 4,
Section 4.1.1.
Conventional codings
Another comment concerns the figures used for evaluating the performances of the
cars on criteria 4 and 5. Recall that those were obtained by averaging equally
spaced numerical codings of an ordinal scale of evaluation. The obtained figures
presumably convey a less quantitative and more conventional meaning than for
instance acceleration performances measured in seconds in standardisable (if not
standardised) trials. These figures however are treated in the weighted sum just
like the more quantitative ones associated with the first three criteria. In par-
ticular, other codings of the ordinal scale might have been envisaged, for instance
codings with unequal intervals separating the levels on the ordinal scale. Some of
these codings could obviously have changed the ranking.
meaningful manner through naive questions about the relative importance of the
criteria; reference to the underlying scale is essential.
Up to this point we have considered the influence on the weights of multiplying
the evaluations by a positive constant. Note that translating the origin of a scale
has no influence on the ranking of the alternatives provided by the weighted sum
since it results in adding a (positive or negative) constant to f , the same for all
alternatives. There is still a very important observation that has to be made:
all scales used in the model are implicitly considered linear in the sense that
equal differences in values on a criterion result in equal differences in the overall
evaluation function f and this does not depend on the position of the interval
of values corresponding to that difference on the scale. For instance in the car
example, car number 12 is finally eliminated because it accelerates too slowly. The
difference between car 12 and car 3 with respect to acceleration is 0.6 between 29
seconds and 29.6 seconds. Does Thierry perceive this difference as almost equally
important as a difference of 0.7 between cars 11 and 3, the latter difference being
positioned between 28.3 seconds and 29 seconds on the acceleration scale? It seems
rather clear from Thierrys motivations, that coming close to a performance of 28
seconds is what matters to him while cars above 29 seconds are unworthy. This
means that the gain for passing from 29.6 seconds to 29 seconds has definitely less
value than a gain of similar amplitude, say from 29 to 28.3 seconds. As will be
confirmed in the sequel (see Section 6.3 below), it is very unlikely that Thierrys
preferences are correctly modelled by a linear function of the current scales of
performance.
Independence or interaction
The next issue is more subtle. Evaluations of the alternatives for the various points
of view taken into consideration by the decision-maker often show correlations; this
is because the attributes that are used to reflect these viewpoints are often linked
by logical or factual interdependencies. For instance, indicators of cost, comfort
and equipment, which may be used as attributes for assessing the alternatives for
those viewpoints, are likely to be positively correlated. This does not mean that
the corresponding points of view are redundant and that one should eliminate
some of them. One is perfectly entitled to work with attributes that are (even
strongly) correlated. That is the first point.
A second point is about independence. In order to use a weighted sum, the
viewpoints should be independent, but not in the statistical sense implying that
the evaluations of the alternatives should be uncorrelated! They should be in-
dependent with respect to preferences. In other words, if two alternatives that
share the same profile on a subset of criteria compare in a certain way in terms of
overall preferences, their relative position should not be altered when the profile
they share on a subset of criteria is substituted by any other common profile. On
the contrary, a famous example of dependence in the sense of preferences in a
gastronomic context is the following: the preference for white wine or red wine
usually depends on whether you are eating fish or meat. There are relatively sim-
ple tests for independence in the sense of preferences, which consist in asking the
6.2. THE WEIGHTED SUM 103
decision-maker about his preferences on pairs of alternatives that share the same
profile for a subset of attributes; varying the common profile should not reverse
the preferences when the points of view are independent. Independence is a nec-
essary condition for the representation of preferences by a weighted sum; it is not
a sufficient one of course.
There is a different concept that has been recently implemented for modelling
preferences. It is the concept of interacting criteria that was already discussed
in example 2 of Chapter 3. Suppose that in the process of modelling the prefer-
ences of the decision-maker, he declares that the influence of positively correlated
aspects should be dimmed and that conjoint good performances for negatively
correlated aspects should be emphasised. In our case for instance, criteria 2 and
3, respectively acceleration and suppleness, may be thought of as being positively
correlated. It may then prove impossible to model some preferences by means of a
weighted sum of the evaluations such as those in Table 6.2 (and even of transfor-
mations thereof such as obtained through formulae like 6.3). This does not mean
that no additive model would be suitable and it does not imply that the prefer-
ences are not independent (in the above-defined sense). In the next section we
shall study an additive model, more general than the weighted average, in which
the evaluations gi may be re-coded through using value functions ui . With
appropriate choices of u2 and u3 it may be possible to take the decision-makers
preferences about positively and negatively correlated aspects into account, pro-
vided they satisfy the independence property. If no re-coding is allowed (like in
the assessment of students, see Chapter 3) there is a non-additive variant of the
weighted average that could help modelling interactions among the criteria; in
such a model the weight of a coalition of criteria may be larger or smaller than
the sum of the weights of its components (see Grabisch (1996), for more detail on
non-additive averages).
1. Uncertainty in the evaluation of the cost: the buying price as well as the
life-length of a second hand car are not known. This uncertainty can be
considered of stochastic nature; statistical data could help to masterto
some extentsuch a source of uncertainty; in practice, it will generally be
very difficult to get sufficient relevant and reliable statistical information in
for this kind of problems.
Making a decision
All these sources of imprecision have an effect on the precision of the determination
of the value of f that is almost impossible to quantify; contrary to what can (often)
be done in physics, there is generally little information on the size of the impre-
cisions; quite often, there is not even probabilistic information on the accuracy
of the evaluations. As a consequence, the apparently straightforward decision
choosing the alternative with the highest value of f or ranking the alternatives in
decreasing order of the values of f might be unconsidered as illustrated above.
The usual way out is extensive sensitivity analysis, which could be described as
part of the validation of the model. This part of the job is seldom carried out
with the required exhaustivity because it is a delicate task at least in two respects.
On the one hand there are many possible strategies for varying the values of the
imprecisely determined parameters; usually parameters are varied one at a time
which is not sufficient but is possibly tractable; the range in which the parameters
must be varied is not even clear as suggested above. On the other hand, once
the sensitivity analysis has been performed, one is likely to be faced with several
almost equally valuable alternatives; in the car problem for instance, the simple
remarks made above strongly suggest that it will be very difficult to discriminate
between cars 3 and 11.
In view of the previous discussion, there are two main approaches to solve the
difficulties raised by the weighted sum:
1. Either one tries to prepare the inputs of the model (linearised evaluations and
trade-offs) as carefully as possible, paying permanent attention to reducing
imprecision and finishing with extensive sensitivity analysis;
2. Or one takes imprecision into account from the start, by avoiding to exploit
precise values when knowing that they are not reliable but rather working
with classes of values and ordered categories. Note that imprecision may well
6.2. THE WEIGHTED SUM 105
lie in the link between evaluations and preferences rather than in the eval-
uations themselves; detailed preferential information, even extracted from
perfectly precise evaluations, may prove rather difficult to elicit.
6.2.5 Conclusion
The weighted sum is useful for obtaining a quick and rough draft of an overall
evaluation of the alternatives. One should however keep in mind that there are
rather restrictive assumptions underlying a proper use of the weighted sum. As a
conclusion to this section we summarise these conditions.
3. The weights are trade-offs. Weights depend on the scaling of the cri-
teria; transforming the (linearised) scales results in a related transformation
of the weights. Weights tell how many units on the scale of criterion i are
needed to compensate one unit of criterion j.
3.5
2.5
value 2
1.5
0.5
0
28 28.5 29 29.5
acceleration (sec)
Figure 6.5: Single-attribute value function for acceleration criterion (half range)
indifferences:
(16 500, 29.5) (17 500, 29.2)
(16 500, 29.2) (17 500, 28.9)
(16 500, 28.9) (17 500, 28.7)
(16 500, 28.7) (17 500, 28.5)
(16 500, 28.5) (17 500, 28.3)
(16 500, 28.3) (17 500, 28.1)
Such a sequence gives the analyst an approximation of the single-attribute
value function u2 , on the half range from 28 to 29.5 seconds but it is easy to devise
a similar procedure for the other half range, from 29.5 to 31. Figure 6.5 shows the
re-coding u2 of the evaluations g2 on the interval [28, 29.5]; there are two linear
parts in the graph: one ranging from 28 to 28.9 where the slope is proportional to
1 1
.2 and the other valid between 28.9 and 29.5 with a slope proportional to .3 .
From there, using the same idea, one is able to re-code the scale of the cost cri-
terion into the single-attribute value function u1 . Then, considering (for instance)
the cost criterion with criteria 3, 4 and 5 in turn, one obtains a re-coding of each
gi into a single-attribute value function ui .
The trade-off between u1 and u2 is easily determined through solving the fol-
lowing equation that just expresses the initial indifference in the standard sequence
(16 500, 29.5) (17 500, 29.2)
If we set k1 to 1, this formula yields k2 and the trade-offs k3 , k4 and k5 are obtained
similarly. Notice that the re-coding process of the original evaluations into value
functions results in a formulation in which all criteria have to be maximised (in
value).
The above procedure, although rather intuitive and systematic is also quite
complex; the questions are far from easy to answer; starting from one reference
point or another (worst point instead of central point) may result in variations in
the assessments. There are however many possibilities for checking for inconsisten-
cies. Assume for instance that a single-attribute value function has been assessed
by means of a standard sequence that links its scale to the cost criterion; one
may validate this assessment by building a standard sequence that links its scale
to another criterion and compare the two assessments of the same value function
obtained in this way; hopefully they will be consistent; otherwise some sort of
retroaction is required.
Note finally that such methods may not be used when the scale on which the
assessments are made only has a finite number of degrees instead of being the set
of real numbers; at least numerous and densely spaced degrees are needed.
(a) (b)
100 100
90 90
80 80
70 70
60 60
value
value
50 50
40 40
30 30
20 20
10 10
0 0
28 29 30 31 28 29 30 31
acceleration (sec) acceleration (sec)
Figure 6.6: Value function for acceleration criterion: (a) initial sketch; (b) final,
with initial sketch in dotted line
Table 6.6: Conversion of verbal levels into numbers in Saatys pairwise comparison
method; e.g. Moderate means 3 times more preferred
and in particular,
1
(6.10) (a, b)
(b, a)
In view of the latter relation, only one half (roughly) of the matrix has to be
elicited, which amounts to answering n(n1)
2 questions.
Relation (6.9) implies that all columns of matrix should be approximately
proportional to f . The pairwise comparisons enable to
1. detect departure from the basic hypothesis in case the columns of are too
far from proportional;
2. correct errors made in the estimation of the ratios; some sort of averaging of
the columns is performed yielding an estimation of f .
A test based on statistical considerations allows the user to determine whether the
assessments in the pairwise comparison matrix show sufficient agreement with the
hypothesis that they are approximations of ff (a) (b) , for an unknown f . If the test
conclusion is negative, it is recommended either to revise the assessments or to
choose another approach more suitable for the type of data.
If one wants to apply AHP in a multiple criteria decision problem, pairwise
comparisons of the alternatives must be performed for each criterion; criteria must
also be compared in a pairwise manner to model their importance. This process
results in functions ui that evaluate the alternatives on each criterion i and in
coefficients of importance ki . Each alternative a is then assigned an overall value
v(a) computed as
Xn
(6.11) v(a) = ki ui (a)
i=1
from 17 500 e to 21 500 e than when lying between 13 500 e and 17 500 e. This
corresponds to the fact that Thierry said he is rather insensitive to cost differences
up to about 17 500 e, which is the amount of money he had budgeted for his car.
For the sake of concision, we have restricted our comparisons to a subset of cars,
namely the top four cars plus the Renault 19, Mazda 323 and Toyota Corolla.
A major issue in the assessment of pairwise comparisons, for instance of alter-
natives in relation to a criterion, is to determine how many times a is preferred to b
on criterion i from looking at the evaluations gi (a) and gi (b). Of course the (ratio)
scale of preference on i is not in general the scale of the evaluations gi . For ex-
ample, Car 11 costs approximately 17 500 e and Car 12 costs about 16 000 e. The
17 500
ratio of these costs, 16 000 , is equal to 1.09375 but this does not necessarily mean
that Car 12 is preferred 1.09375 times more than Car 11 on the cost criterion; this
is because the cost evaluation does not measure the preferences directly. Indeed, a
transformation (re-scaling) is usually needed to go from evaluations to preferences;
for the cost, according to Thierry himself, the transformation is not linear since
equal ratios corresponding to costs located either below or above 17 500 e do not
correspond to equal ratios of preference. But even in linear parts, the question
is not easily answered. A decision-maker might very well say that Car 12 is 1.5
times more preferred than Car 11 for the cost criterion; or he could say 2 times or
4 times. All depends on what the decision-maker would consider as the minimum
possible cost; for instance (supposing that the transformation of cost into prefer-
ence is linear), if Car 12 is declared to be 1.5 times more preferred to Car 11, the
zero of the cost scale x would be such that
17 500 x
= 1.5 ,
16 000 x
i.e. x = 14 500 e. The problem is even more crucial for transforming scales such
as those on which braking or road-holding are evaluated. For instance, how many
times is Car 3 preferred to Car 10 with respect to the braking criterion? In other
words, how many times is 2.66 better than (preferred to) 2.33?
Similar questions arise for the comparison of importance of criteria. We discuss
the determination of the weights ki of the criteria in formula 6.11 below. For
computing those weights, the relative importance of each criterion with respect
to all others must be assessed. Our assessments are shown in Table 6.7. We
made them directly in numerical terms taking into account a set of weights that
Thierry considered as reflecting his preferences; those weights have been obtained
using the Prefcalc software and a method that is discussed in the next session.
By default, the blanks on the diagonal should be interpreted as 1s; the blanks
below the diagonal are supposed to be 1 over the corresponding value above the
diagonal, according to equation 6.10.
Once the matrix in Table 6.7 has been filled, several algorithms can be proposed
to compute the priority of each criterion with respect to the goal symbolised by
the top node of the hierarchy (under the hypothesis that the elements of the
assessment matrix are approximations of the ratios of those priorities). The most
famous algorithm, which was initially proposed by Saaty, consists in computing
the eigenvector of the matrix corresponding to the largest eigenvalue (see Harker
114 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
Table 6.7: Assessment of the comparison of importance for all pairs of criteria.
For instance, the number 2 at the intersection of 1st row and 3rd column means
that Cost is considered twice as important as Pick-up
.
Note that only the lowest degrees of the 1 to 9 scale have been used in Table 6.7.
This means that the weights are not perceived as very contrasted; in order to get
the sort of gradation of the weights as above (the ratio of the highest to the lowest
value is about 3), some comparisons have been assessed by non-integer degrees,
which normally are not available on the verbal counterpart of the 1 to 9 scale
described in Table 6.6. When the assessments are made through this verbal scale,
approximations should be made, for instance by saying that cost and acceleration
are equally important and substituting 1.5 by 1. Note that the labelling of the
degrees on the verbal scale may be misleading; one would quite naturally qualify
the degree to which Cost is more important than Acceleration as Moderate
until it is fully realised that Moderate means three times as important; using
the intermediary level between Equal and Moderate would still mean twice
as important.
It should be emphasised that the eigenvalue method is not linear. What
would have changed if we had scaled the importance differently, for instance as-
sessing the comparisons of importance by degrees twice as large as those in Table
6.7 (except for 1s that remain constant)? Would the coefficients of importance
have been twice as large? Not at all! The resulting weights would have been much
more contrasted, namely:
Name of car Nr 7 11 3 12 10 4 6
Honda Civic 7 1.0 1.0 2.0 4.0 4.0 5.0 5.0
Peugeot 309/16V 11 1.0 1.0 2.0 3.0 4.0 4.0 4.0
Nissan Sunny 3 0.50 0.50 1.0 1.50 2.0 3.0 3.0
Peugeot 309 12 0.25 0.33 0.67 1.0 1.0 2.0 2.0
Renault 19 10 0.25 0.25 0.5 1.0 1.0 1.0 1.5
Mazda 323 4 0.2 0.25 0.33 0.5 1.0 1.0 1.0
Toyota Corolla 6 0.2 0.25 0.33 0.5 0.67 1.0 1.0
Using the latter set of weights instead of the former would substantially change the
values attached to the alternatives through formula 6.11 and might even alter their
ordering. So, contrary to the determination of the trade-offs in an additive value
model (which may be re-scaled through multiplying them by a positive number,
without altering the way in which alternatives are ordered by the multi-attribute
value function), there is no degree of freedom in the assessment of the ratios in
AHP; in other words, these assessments are made on an absolute scale.
As a further example, we now apply the method to determine the evaluation
of the alternatives in terms of preference on the Acceleration criterion. Suppose
the pairwise comparison matrix has been filled as shown in Table 6.8, in a way
that seems consistent with what we know of Thierrys preferences. Applying the
eigenvalue method yields the following priorities attached to each of the cars in
relation to acceleration:
0.3
0.25
0.15
0.1
0.05
0
28 28.5 29 29.5 30 30.5 31
acceleration (sec)
(dotted line) on the graph of the priorities. There seems to be a good fit of the
two curves but this is only an example from which no general conclusion can be
drawn.
Comments on AHP
Although the models for describing the overall preferences of the decision-maker
are identical in multi-attribute value theory and in AHP, this does not mean that
applying the respective methodologies of these theories normally yields the same
overall evaluation of the alternatives. There are striking differences between the
two approaches from the methodological point of view. The ambition of AHP is
to help construct evaluations of the alternatives for each viewpoint (in terms of
preferences) and of the viewpoints with regard to the overall goal (in terms of
importance); these evaluations are claimed to belong to a ratio scale, i.e. to be
determined up to a positive multiplicative constant. Since the eigenvalue method
yields a particular determination of this constant and this determination is not
taken into account when assessing the relative importance of the various criteria,
the evaluations in terms of preference must be considered as if they were made on
an absolute scale, which has been repeatedly criticised in the literature (see for
instance Belton (1986) and Dyer (1990)). This weakness (that can also be blamed
on direct rating techniques, as mentioned above) could be corrected by asking the
decision-maker about the relative importance of the viewpoints in terms of passing
from the least preferred value to the most preferred value on criterion i compared
6.3. THE ADDITIVE VALUE MODEL 117
to a similar change on criterion j (Dyer 1990). Taking this suggestion into account
would however go against one of the basic principles of Saatys methodology, i.e.
the assumption that the assessments at all levels of the hierarchy can be made
along the same procedure and independently of the other levels. That is probably
why the original method, although seriously attacked, has remained unchanged.
AHP has been criticised in the literature in several other respects. Besides the
fact already mentioned that it may be difficult to reliably assess comparisons of
preferences or of importance on the standard scale described in Table 6.6, there
is an issue about AHP that has been discussed quite a lot, namely the possibility
of rank reversal. Suppose alternative x is removed from the current set and
nothing is changed to the pairwise assessments of the remaining alternatives; it
may happen that an alternative, say, a among the remaining ones could now be
ranked below an alternative b whilst it was ahead of b in the initial situation. This
phenomenon was discussed in Belton and Gear (1983) and Dyer (1990) (see also
Harker and Vargas (1987) for a defense of AHP).
such that a % b u(a) u(b). Without loss of generality, the lowest (resp.
highest) value of u is conventionally set to 0 (resp. 1); 0 (resp. 1) is the value of an
(fictitious) alternative whose assessment on each criterion would be to the worst
(resp. best) evaluation attained for the criterion on the current set of alternatives.
This fictitious alternative is sometimes called the anti-ideal (resp. ideal ) point.
In our example, the anti-ideal car, costs 21 334 e, needs 30.8 seconds to cover
1 km starting from rest and 41.6 seconds, starting in fifth gear at 40km/h; its
performance regarding brakes and road-holding are respectively 1.33 and 1.25.
The ideal car on the opposite side of the range, costs 13 841 e, needs 28 seconds
to cover 1km starting from rest and 34.7 seconds, starting in fifth gear at 40km/h;
its performance regarding brakes and road-holding are respectively 2.66 and 3.25.
118 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
The shape of the single-attribute value function for the cost criterion for in-
stance is modelled as follows. The user fixes the number of linear pieces; suppose
that you decide to set it to 2 (which is a parsimonious option and the default
value proposed in Prefcalc); the single-attribute value function of the cost could
for instance be represented as in Figure 6.8. Note that the maximal value of the
utility (reached for a cost of 13 841 e) is scaled in such a way that it corresponds
to the value of the trade-off associated with the cost criterion, i.e. .43 in the exam-
ple shown in Figure 6.8. Note also that with two linear pieces, one for each half
of the cost range, the single-attribute value function is completely determined by
two numbers, i.e. the utility value at mid-range and the maximal utility. Those
values, say u1,1 , u1,2 are variables of the linear program that Prefcalc writes and
solves. The pieces of information on which the formulation of the linear program
relies are obtained from the user. The user is asked to select a few alternatives
that he is familiar with and feels able to rank-order according with his overall
preferences. The ordering of these alternatives, which include the fictitious ideal
and anti-ideal ones, induces the corresponding order on their overall value and
hence, generates constraints of the linear program. Prefcalc then tries to find
levels ui,1 , ui,2 for each criterion i, which will make the additive value function
compatible with the declared information. If the program is not contradictory,
i.e. if an additive value function (with 2-piece piece-wise linear single-attribute
value functions) proves compatible with the preferences, the system tries to find a
solution among all feasible solutions, that maximises the discrimination between
6.3. THE ADDITIVE VALUE MODEL 119
the selected alternatives. If no feasible solution can be found, the system proposes
to increase the number of variables of the model, for instance by using a higher
number of linear pieces in the description of the single-attribute value functions.
This method could be described as a learning process; the system fits the
parameters of the model on the basis of partial information about the users pref-
erences; the set of alternatives on which the user declares his global preferences
may be viewed as a learning set. For more details on the method, the reader is
referred to Vincke (1992), Jacquet-Lagreze and Siskos (1982).
In his ex post study Thierry selects five cars, besides the ideal and anti-ideal
ones and ranks them in the following order:
1. the lack of sensitivity in the price in the range from 13 841 e to 17 576 e (he
was a priori estimating his budget at about 17 500 e);
However, Thierry disagrees with the modelling of the braking criterion, which
he considers equally important as road-holding. He believes that the relative
importance of the fourth and fifth criteria should be revised. Thierry then looks
at the ranking of the cars according to the computed value function. The ranking
as well as the multi-attribute value assigned to each car are given in Table 6.9.
120 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
Table 6.9: Ranking obtained using Prefcalc. The cars ranked by Thierry are those
marked with a *
Thierry feels that Car 10 (Renault 19) is ranked too high while Car 7 (Honda
Civic) should be in a better position.
In view of these observations, Thierry modifies the single-attribute value func-
tions for criteria 4 and 5. For the braking criterion, the utility (0.01) associated
with 2 remains unchanged while the utility of the level 2.7 is raised to 0.1 instead
of 0.01. The road-holding criterion is also modified; the value (0.2) associated with
the level 3.2 is lowered to 0.1 (see Figure 6.9). Note that Prefcalc normalises the
value function in order that the ideal alternative is always assigned the value 1;
of course due to the numbers display format with two decimal positions, the sum
of the maximal values of the single-attribute value functions may be only approx-
imately equal to 1. Running Prefcalc with the altered value functions returns the
ranking in table 6.10 and the revised multi-attribute value after each car name.
After he sees the modified ranking yielded by Prefcalc, Thierry feels that the
new ranking is fully satisfactory. He observes that if he had used Prefcalc a few
years earlier, he would have made the same choice as he actually did; he considers
this as a good point as far as Prefcalc is concerned. He finally makes the following
comments: Using Prefcalc has enhanced my understanding of both the data and
my own preferences; in particular I am more conscious of the relative importance
I give to the various criteria.
Brake .1 Road .1
Figure 6.9: Modified single-attribute value functions for the braking and road-
holding criteria
Table 6.10: Modified ranking using Prefcalc. The cars ranked by Thierry are those
marked with *
122 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
Observe that the user may well have a very vague understanding of the method
itself; he simply validates the method by using it to reproduce results that he has
confidence in. After such a successful empirical validation step he will be more
prone to use the method in new situations that he does not master that well.
What are the drawbacks and traps of Prefcalc? Obviously Prefcalc can only be
used in cases where the overall preference of the decision-maker can be represented
by an additive multi-attribute value function (as described by Equation 6.8). In
particular, this is not the case when preferences are not transitive or not complete
(for arguments supporting the possible observation of non-transitive preferences,
see the survey by Fishburn (1991)). There are some additional restrictions due
to the fact that the shapes of the single-attribute value functions that can be
modelled by Prefcalc are limited to piece-wise linear functions. This is hardly a
restriction when dealing with a finite set of alternatives; by adapting the number
of linear pieces one can obtain approximations of any continuous curve that can
be as accurate as desired. When bounded to a small number of pieces, this may
however be a more serious restriction.
Stability of ranking
The main problem raised by the use of such a tool is the indetermination of the
estimated single-attribute value functions (including the estimation of the trade-
offs). Usually, if the preferences declared on the set of well-known alternatives are
compatible with an additive value model, there will be several value functions that
can represent these preferences. Prefcalc chooses one such representation according
to the principles outlined above, i.e. the most discriminating (in a sense). Other
choices of a model albeit compatible with the declared preferences on the learning
set, may lead to variations in the rankings of the remaining alternatives. Slight
variations in the trade-off values can yield rank reversals. For instance, with all
trade-offs within .02 of their value in Figure 6.9, changes already occur. Passing
from the set of trade-offs (.43, .23, .13, .10, .10) to (.45, .21, .11, .12, .10) results in
exchanging the positions of Honda Civic and Peugeot 309, which are ranked 3rd
and 4th respectively after the change. This rank reversal is obtained by putting
slightly more emphasis on cost and slightly less on performance. Note that such
a slight change in the trade-offs has an effect on the ranking of the top 4 cars,
those on which Thierry focused after his preliminary analysis (see Table 6.3). It
should thus be very clear that in practice, determining the trade-offs with sufficient
accuracy could be both crucial and challenging. It is therefore of prime importance
to carry out a lot of sensitivity analyses in order to identify which parts of the
result remain reasonably stable.
correct order in the output of Prefcalc. What would have happened if the learning
set had been different?
Let us take another subset of 5 cars and declare preferences that agree with
the ranking validated by Thierry (Table 6.10). When substituting the top 2 cars
(Peugeot 309/16V, Nissan Sunny) by Renault 19, Mitsubishi Colt, two cars in the
middle segment of the ranking, the vector of trade-offs is (.53, .06, .08, .08, .25)
and the top four in the new ranking are Renault 19 (1), Peugeot 309 (2), Peugeot
309/16V (3), and Nissan Sunny (4); Honda Civic is relegated to the 12th position.
In the choice of the present learning set, stronger emphasis has been put on cost
and safety (brakes and road-holding) and much less on performance (acceleration
and pick up); three of the former top cars remain in the top four; Honda recedes
due to its higher cost and its weakness on road-holding; Renault 19 is heading the
race mainly due to excellent road-holding.
6.3.4 Conclusion
This section has been devoted to the construction of a formal model that represents
preferences on a numerical scale. Such a model can only be expected to exist when
preferences satisfy rather demanding hypotheses; it thus relies on firm theoretical
bases, which is undoubtedly part of the intellectual appeal of the method. There
is at least one additional advantage to theoretically well-founded decision models;
such models can be used to legitimate a decision to persons that have not been
involved in the decision making process. Once the hypotheses of the model have
been accepted or proved valid in a decision context and provided the process of
elicitation of the various parameters of the model has been conducted correctly,
the decision becomes transparent.
The additive multi-attribute value model is rewarding, when established and
accepted by the stake-holders, since it is directly interpretable in terms of decision;
the best decision is the one the model values most (provided the imprecisions in
the establishment of the model and the uncertainties in the evaluation information
allow to discriminate at least between the top alternatives). The counterpart of
the clear-cut character of the conclusions that can be drawn from the model is
that establishing the model requires a lot of information and of a very precise and
particular type. This means that the model may be inadequate not only because
the hypotheses could not be fulfilled but also because the respondents might feel
unable to answer the questions or because their answers might not be reliable.
Indirect methods based on exploiting partial information and extrapolating it (in
a recursive validation process) may help when the information is not available in
explicit form; it remains that the quality of the information is crucial and that a lot
of it is needed. In conclusion, direct assessment of multi-attribute value functions
is a narrow road between the practical problem of obtaining reliable answers to
difficult questions and the risks involved in building a model on answers to simpler
but ambiguous questions.
In the next section we shall explore a very different formal approach that may
be less demanding with regard to the precision of the information, but also provides
less conclusive outputs.
Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 5 3 1 2 2 3 3 2 3 2 2 2 2 3
2 2 5 2 4 2 3 2 3 3 1 1 1 4 3
3 4 4 5 4 4 4 4 4 4 3 2 3 5 4
4 3 1 1 5 1 3 1 2 1 2 1 1 4 2
5 3 3 1 5 5 3 2 2 2 3 1 1 5 2
6 2 2 1 2 2 5 2 2 2 2 1 1 3 2
7 3 3 1 4 4 4 5 3 4 3 2 2 4 4
8 3 2 1 4 4 4 3 5 3 2 0 2 4 3
9 2 3 1 4 4 3 1 2 5 2 1 2 4 3
10 4 4 2 3 2 3 2 3 3 5 3 2 4 3
11 4 4 3 4 4 4 4 5 4 3 5 4 4 5
12 4 4 2 4 4 4 4 4 3 4 3 5 5 4
13 3 2 0 2 1 2 1 2 1 1 1 0 5 1
14 2 3 1 3 3 3 1 3 3 2 0 1 4 5
Table 6.11: Number of criteria in favour of a when compared to b for all pairs of
cars a, b in the Choosing a car problem
the smallest Borda score. This method can be seen as a method of construction of
a synthetic evaluation of the alternatives in multiple criteria decision analysis, the
points of view corresponding to the voters and the alternatives to the candidates;
all criteria-voters have equal weight and coding by the rank number of the position
of the candidate in a voters preference looks like a form of evaluation.
Condorcets method consists of a kind of tournament where all candidates
compare in pairwise contests. A candidate is declared to be preferred to another
according to a majority rule, i.e. if more voters rank him before the latter than
the converse. The result of such a procedure is a preference relation on the set
of candidates that in general is neither transitive nor acyclic. A further step is
thus needed in order to exploit this relation in view of the selection of one or
several candidates or in view of ranking all the candidates. This idea can of course
be transposed in the multiple criteria decision context. We do this below, using
Thierrys case again for illustrative purpose; we show how the problems raised by
a direct transposition rather naturally lead to elementary outranking methods.
For each pair of cars a and b, we count the number of criteria according to
which a is at least as good as b. This yields the matrix given in Table 6.11; the
elements of the matrix are integers ranging from 0 to 5. Note that we might have
alternatively decided to count the criteria for which a is better than b, not taking
into account criteria for which a and b are tied.
What we could call the Condorcet preference relation is obtained by deter-
mining for each pair of alternatives a, b whether or not there is a (simple) majority
of criteria for which a is at least as good as b. Since there are 5 criteria, the ma-
jority is reached as soon as at least 3 criteria favour alternative a when compared
to b. The preference matrix is thus obtained by substituting 1 to any number
larger or equal to 3 in Table 6.11 and 0 to any number smaller than 3 yielding the
126 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 1 0 0 0 1 1 0 1 0 0 0 0 1
2 0 1 0 1 0 1 0 1 1 0 0 0 1 1
3 1 1 1 1 1 1 1 1 1 1 0 1 1 1
4 1 0 0 1 0 1 0 0 0 0 0 0 1 0
5 1 1 0 1 1 1 0 0 0 1 0 0 1 0
6 0 0 0 0 0 1 0 0 0 0 0 0 1 0
7 1 1 0 1 1 1 1 1 1 1 0 0 1 1
8 1 0 0 1 1 1 1 1 1 0 0 0 1 1
9 0 1 0 1 1 1 0 0 1 0 0 0 1 1
10 1 1 0 1 0 1 1 1 1 1 1 0 1 1
11 1 1 1 1 1 1 1 1 1 1 1 1 1 1
12 1 1 0 1 1 1 1 1 1 1 0 1 1 1
13 1 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 1 0 1 1 1 0 1 1 0 0 0 1 1
relation described by the 0-1 matrix in Table 6.12. Note that a criterion counts
both in favour of a and in favour of b only if a and b are tied on that criterion;
the relation is reflexive since any alternative is at least as good as itself along all
criteria.
Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 1 0 0 0 0 0 0 0 0 1 0
3 1 1 1 1 1 1 1 1 1 0 0 0 1 1
4 0 0 0 1 0 0 0 0 0 0 0 0 1 0
5 0 0 0 1 1 0 0 0 0 0 0 0 1 0
6 0 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 1 0 1 0 0 0 1 1
8 0 0 0 1 1 0 0 1 0 0 0 0 1 0
9 0 0 0 0 0 0 0 0 1 0 0 0 1 0
10 1 1 0 0 0 0 0 0 0 1 0 0 1 0
11 1 1 0 1 1 0 1 1 1 0 1 1 1 1
12 1 1 0 1 1 1 1 1 0 1 0 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Table 6.13: Condorcet preference relation for the Choosing a car problem. A
1 at the intersection of the a row and the b column means that a is rated not
lower than b on at least 4 criteria
by the number of those they beat (i.e. are at least as good on 4 criteria or more)
one sees that 3, 11 and 12 come in the first position (they are preferred to 10 other
cars), then there is a big gap after which come 7, 8 and 10 that beat only 3 other
cars. Conversely, there are two non-beaten cars, 3 and 11, then come 10 and 12
(beaten by one car); 7 is beaten by 3 cars.
In the present case, we see that the simple approach that was used essentially
makes the same cars emerge as the methods used so far. There are at least two
radical differences between approaches based on the weighted sum and some more
sophisticated way of assessing each alternative by a single number that synthesises
all the criteria values. One is that all criteria have been considered equally impor-
tant; it is possible however to take information on the relative importance of the
criteria into account as will be seen in section 6.4.3.
The second difference is more in the nature of the type of approach; the most
striking point is that the size of the differences in the evaluations of a and b for all
criteria does not matter; only the signs of those differences do. In other words, had
the available information been rankings of the cars with respect to each criterion
(instead of numeric evaluations), the result of the Condorcet procedure would
have been exactly the same. More precisely, suppose that all that we know (or
that Thierry considers relevant in terms of preferences) about the cost criterion is
the ordering of the cars according to the estimated cost, i.e.
the assessments for the cars on the cost criterion are rather rough estimations
of an expected cost (see section 6.1.1); in particular it is presumed that on
average the lifetimes of all alternatives are equal; is it reasonable in those
circumstances to rely on precise values of differences of these estimations to
select the best alternative?
estimations of cost, even reliable ones, are not necessarily related with pref-
erences on the cost criterion in a simple way.
Such issues were discussed extensively in section 6.2.4. The whole analysis carried
out there was aimed towards the construction of a multiple criteria value function,
which implies making any difference in evaluations on a criterion equivalent to
some uniquely defined difference for any other criterion. The many methods that
can be used to build a value function by questioning a decision-maker about his
preferences may well fail however; let us list a few reasons for the possible failure
of these methods:
time pressure may be so intense that there is not enough time available to
engage in the lengthy elicitation process of a multiple criteria value function;
it may be that the importance of the decision to be made does not justify
such an effort;
the decision-maker might not know how to answer the questions or might
try to answer but prove inconsistent or might feel discomfort in being forced
to give precise answers where things are vague to him;
in case of group decision, the analyst may be unable to make the various
decision-makers agree on the answers to be given to some of the questions
raised in the elicitation process.
by having experienced the braking behaviour of specific cars rated at the various
levels of the scale, but such knowledge cannot be expected from a decision-maker
(otherwise there would be no room on the marketplace for all the magazines that
evaluate goods in order to help consumers spend their money while making the
best choice). Also remember that braking performance has been described by the
average of 3 indices evaluating aspects of the cars braking behaviour; this does
not favour a deep intuitive perception of what the levels on that scale may really
mean. So, one has to admit that in many cases the definition of the levels on scales
is quite far from precise in quantitative terms and it may be hygienic not to use
the fallacious power of numbers. This is definitely the option chosen in the meth-
ods discussed in the present section. Not that these methods are purely ordinal;
but differences between levels on a scale are carefully categorised, yet usually in a
coarse-grained fashion, in order not to take into account differences that are only
due to the irrelevant precision of numbers.
The Condorcet idea for a voting procedure has been transposed in decision analysis
under the name of outranking methods. Such a transposition takes the peculiari-
ties of the decision analysis context into account, in particular the fact that criteria
may be perceived as unequally important; additional elements such as the notion
of discordance have also been added. The principle of these methods is as fol-
lows. Each pair of alternatives is considered in turn independently of third part
alternatives; when looking at alternatives a and b, it is claimed that a outranks
b if there are enough arguments to decide that a is at least as good as b, while
there is no essential reason to refute that statement (Roy (1974), cited by Vincke
(1992), p. 58). Note that taking strong arguments against declaring a preference
into account is typically what is called discordance and is original with respect
to the simple Condorcet rule. Such an approach has been operationalised through
various procedures and particularly the family of ELECTRE methods associated
with the name of B. Roy. (For an overview of outranking methods, the reader is
referred to the books by Vincke (1992) and Roy and Bouyssou (1993)). Below, we
discuss an application of the simplest of these methods, ELECTRE I, to Thierrys
case; ELECTRE I is a tool designed to be used in the context of a choice deci-
sion problem; it builds up a set of which the best alternativeaccording to the
decision-makers preferencesshould be a member. Let us emphasise that this set
cannot be described as the set of best alternatives, not even a set of good alter-
natives, but just a set that contains the best alternatives. We shall then show
how the fundamental ideas of ELECTRE I can be sophisticated, in particular in
view of helping to rank the alternatives. Our goal is not to make a survey of all
outranking methods; we just want to present the basic ideas of such methods and
illustrate some problems they may raise.
130 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
where the pi s are normalised weights that reflect the relative importance of the
criteria; gi (a) denotes, as usual, the evaluation of alternative a for criterion i (which
is assumed to be maximised; if it were to be minimised, the weight pi would be
added when the converse inequality holds, i.e. gi (a) gi (b)). So, as often as the
evaluation of a passes or equals that of b on a criterion, its weight now enters into
the weight of the coalition (additively) in favour of a. A criterion can count both
for a against b and the opposite if and only if gi (a) = gi (b).
In the context of outranking, the weights are not trade-offs; they are completely
independent of the scales for the criteria. A practical consequence is that one
may question the decision-maker in terms of relative importance of the criteria
without reference to the scales on which the evaluations for the various viewpoints
are expressed. This does not mean however that they are independent of the
method and that one could use values given spontaneously by the decision-maker
or through questioning in terms of importance without care, without reference
to the evaluations as is done in Saatys procedure. It is important to bear in mind
how the weights will be used, in this case to measure the strength of coalitions
in pairwise comparisons and decide on the preference only on the basis of the
coalitions.
To be more specific and contrast the meaning of the weights from those used
in weighted sums, let us first consider those suggested by Thierry in section 6.2.2,
i.e. (1, 2, 1, 0.5, 0.5). Note that these were not obtained through questioning on the
relative importance of criteria but in the context of the weighted sum with Thierry
bearing re-scaled evaluations in mind: the evaluations on each criterion had been
divided by the maximal value gi,max attained for that criterion. Dividing the
weights by their sum (= 5), yields the normalised weights (.2, .4, .2, .1, .1). Using
these weights in outranking methods would lead to an overwhelming predominance
of criteria 2 (Acceleration) and 3 (Pick-up), which are also linked since they are
facets of the cars performance. With such weights and a concordance threshold of
at least .5 , it is impossible for a car to be outranked when it is better on criteria 2
and 3 even if all other criteria are in favour of an opponent. It was never Thierrys
intention that once a car is better on criteria 2 and 3, there is no need for looking
at the other criteria; the whole initial analysis shows on the contrary, that a fast
and powerful car is useless, for instance, if it is bad on the braking or road-holding
criterion. Such a feature of the preference structure could indeed be reflected
6.4. OUTRANKING METHODS 133
through the use of vetoes, but only in a negative manner, i.e. by removing the
outranking of a safe car by a powerful one, not by allowing a safe car to outrank
a powerful one. Note that the above weights may nevertheless be appropriate
for a weighted sum because in such a method, the weights are multiplied by the
evaluations (or re-coded evaluations). To make it clearer, consider the following
reformulation of the condition under which a is preferred to b in the weighted sum
model (a similar formulation is straightforward in the additive value model)
n
X
(6.13) a % b iff ki (gi (a) gi (b)) 0.
i=1
If a is slightly better than b on a point of view i, the influence of this fact in the
comparison between a and b is reflected by the term ki (gi (a) gi (b)) which
is presumably small. Hence, important criteria count for little in pairwise com-
parisons when the difference between the evaluations of the alternatives are small
enough. On the contrary, in outranking methods, weights are not divided; when a
is better than b on some criterion, the full weight of the criterion counts in favour
of a, whether a is either slightly or by far better than b.
Since the weights in a weighted sum depend on the scaling of each criterion and
there is no acknowledged standard scaling, it makes no sense in principle to use
the weights initially provided by Thierry as coefficients measuring the importance
of the criteria in an outranking method. If we nevertheless try to use them, we
might consider the weights used with the normalised criteria of Table 6.4. We see
that the importance of the safety coalition (Criteria 4 and 5) would be negligible
(weight = .20), while the importance of the performance coalition (Criteria 2
and 3) would be overwhelming (weight = .60). There is another reasonable nor-
malisation of the criteria that does not fix the zero of the scale but rather maps
the smallest attained value gi,min onto 0 and the largest gi,max onto 1. Transform-
ing the weights accordingly (i.e. multiplying them by the inverse of the range of
the values for the corresponding criterion prior to the transformation) one would
obtain (.28, .14, .13, .20, .25) as a weight vector. With these values as coefficients of
importance, the safety coalition (Criteria 4 and 5; weight = .45) becomes more
important than the performance coalition (Criteria 2 and 3; weight = .27) that
Thierry may consider unfair. As an additional conclusion, one may note that the
values of the weights vary tremendously depending on the type of normalisation
applied.
Now look at the weights (.35, .24, .17, .12, .12 ) obtained through Saatys ques-
tioning procedure in terms of importance (see section 6.3.2). Using these weights
for measuring strength of coalitions does not seem appropriate, since criteria 1 and
2s predominance is too strong (joint weight = .35 + .24 = .59).
Due to the all or nothing character of the weights in ELECTRE I, one is
inclined to choose less contrasted weights than those examined above. Although
there are procedures that have been proposed to elicit such weights (see Mousseau
(1993), Roy and Bouyssou (1993)), we will just choose a set of weights in an
intuitive manner; let us take weights proportional to (10, 8, 6, 6, 6) as reflecting the
relative importance of the criteria. At least the ordering of the values seems to be
134 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 .5 .17 .33 .33 .56 .61 .33 .61 .33 .33 .33 .33 .61
2 .49 1 .44 .83 .33 .56 .44 .61 .61 .28 .28 .28 .83 .61
3 .83 .73 1 .73 .73 .73 .78 .78 .83 .56 .44 .56 1 .78
4 .66 .17 .28 1 .17 .56 .28 .44 .28 .44 .28 .28 .78 .44
5 .66 .66 .28 1 1 .56 .44 .44 .44 .66 .28 .28 1 .44
6 .44 .44 .28 .44 .44 1 .44 .44 .44 .44 .28 .28 .61 .44
7 .56 .56 .22 .73 .73 .73 1 .56 .83 .56 .39 .39 .73 .83
8 .66 .39 .22 .73 .73 .73 .61 1 .66 .39 0 .39 .73 .66
9 .39 .56 .17 .73 .73 .56 .17 .33 1 .39 .17 .39 .73 .61
10 .83 .73 .44 .56 .33 .56 .61 .61 .61 1 .61 .33 .83 .61
11 .83 .73 .56 .73 .73 .73 .78 1 .83 .56 1 .73 .73 1
12 .83 .73 .44 .73 .73 .73 .78 .78 .61 .83 .61 1 1 .78
13 .66 .39 0 .39 .17 .39 .28 .44 .28 .17 .28 0 1 .28
14 .39 .56 .22 .56 .56 .56 .17 .56 .56 .39 0 .22 .73 1
Table 6.14: Concordance index (rounded to two decimals) for the Choosing a
car problem
concentrate on two values of the concordance threshold, .60 and .65, that are
on both sides of the borderline separating concordance relations with and without
cycles; above these values, concordance relations tend to become increasingly poor;
below, they are less and less discriminating.
In the above presentation the weights sum up to 1. Note that multiplying
all the weights by a positive number would yield the same concordance relations
provided the concordance threshold is multiplied by the same factor; the weights
in ELECTRE I may be considered as being assessed on a ratio scale, i.e. up to a
positive scaling factor.
Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 1 0 0 0 0 0 0 0 0 1 0
3 1 1 1 1 1 1 1 1 1 0 0 0 1 1
4 1 0 0 1 0 0 0 0 0 0 0 0 1 0
5 1 1 0 1 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 1 1 1 1 0 1 0 0 0 1 1
8 1 0 0 1 1 1 0 1 1 0 0 0 1 1
9 0 0 0 1 1 0 0 0 1 0 0 0 1 0
10 1 1 0 0 0 0 0 0 0 1 0 0 1 0
11 1 1 0 1 1 1 1 1 1 0 1 1 1 1
12 1 1 0 1 1 1 1 1 0 1 0 1 1 1
13 1 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 0 0 0 0 0 1 1
Table 6.15: Concordance relation for the Choosing a car problem with weights
.28, .22, .17, .17, .17 and concordance threshold .65
Class 1 2 3 4 5 6 7 8 9
A 11 3, 12 8 7 5 9, 10 2, 4 13, 14 1, 6
(11) (10) (7) (6) (5) (3) (2) (1) (0)
B 3, 11 12 10 7, 8 9 2, 6, 14 5 1, 4 13
(0) (1) (2) (3) (4) (5) (6) (8) 11
Table 6.16: Rankings obtained from counting how many alternatives are beaten
(ranking A) or beat (ranking B) each alternative in the concordance relation
(threshold .65); the numbers between parentheses in the second row of ranking A
(resp. ranking B) are the numbers of beaten (resp. beating) alternatives for each
alternative of the same column in the first row
above method, the information contained in other cutting levels has been totally
ignored although the rankings obtained from them may not be identical. They
may even differ significantly as can be seen when deriving a ranking from the .60
cut by using the method we applied to the .65 cut.
Thresholding
To this point, both in the Condorcet-like method and the basic ELECTRE I
method (without veto), we treated the assessments of the alternatives as if they
were ordinal data, i.e. we could have obtained exactly the same results (kernel
or ranking) by working with the orders induced from the set of alternatives by
their evaluations on the various criteria. Does this mean that outranking methods
are purely ordinal? Not exactly! More sophisticated outranking methods exploit
information that is richer than purely ordinal but not as demanding as cardinal.
This is done through what we shall call thresholding. Thresholding amounts to
identifying intervals on the criteria scales, which represent the minimal difference
evaluation above which a particular property holds. For instance, consider that
the assessment of b on criterion i, gi (b), is given and criterion i is to be maximised;
from which value gi (b) + ti (gi (b)) onwards, will an alternative a be said to be
preferred to b? Implicitly, we have considered previously that b was preferred to a
on criterion i as soon as gi (b) gi (a), i.e. we have considered that ti (gi (b)) = 0. In
view of imprecision in the assessments and since it is not clear for all criteria that
there is a marked preference when the difference |gi (a)gi (b)| is small, one may be
led to consider a non-null threshold to model preference. In our case, for instance,
it is not likely that Thierry would really mark a preference between cars 3 and 10
on the Cost criterion since their estimated costs are within 10 e (see Table 6.2).
Thresholding is all the more important that, as mentioned at the end of section
6.4.1, the size of the interval between the evaluations is not taken into account
when deciding that a is overall preferred to b. Hence one should be prudent when
deciding that a criterion is or is not an argument for saying that a is at least as
good as b; therefore, it is reasonable to determine a threshold function ti and say
that criterion i is such an argument as soon as gi (a) gi (b) + ti (gi (b)); since we
examine reasons for saying that a is at least as good as b, not for saying that a is
(strictly) better than b, the function ti should be negatively valued.
Determining such a threshold function is not necessarily an easy task. One
could ask the decision-maker to tell, ideally for each evaluation gi (a) of each al-
ternative on each criterion, from which value onwards an evaluation should be
considered at least as good as gi (a). Things may become simpler if the threshold
may be considered constant or proportional to gi (a) (e.g. ti (gi (a)) = .05 gi (a)).
Note that constant thresholds could be used when a scale is linear in the sense
that equal differences throughout a scale have the same meaning and consequences
(see end of section 6.2.3); however this is not a necessary condition since some dif-
ferences, but not all, need to be equivalent throughout the scale. In any case,
Definition 6.12 of the concordance index is adapted in a straightforward manner
138 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
as follows and the method for building an outranking relation remains unchanged:
X
(6.14) c(a, b) = pi .
i:gi (a)gi (b)+ti (gi (b))
Note that preference thresholds, that lead to indifference zones, are used in
a variant of the ELECTRE I method called ELECTRE IS (see Roy and Skalka
(1984) or Roy and Bouyssou (1993)).
Thresholding is a key tool in the original outranking methods; it allows one
to bypass the necessity of transforming the original evaluations to obtain linear
scales. There is another occasion for invoking thresholds, which is in the analysis
of discordance.
accepting a constant veto threshold of about 1.5 or 1.6 second. If we decide that
there is a veto with a constant threshold on the acceleration criterion for differ-
ences exceeding 1.5 second, it means that a car that accelerates from 0 to 100
km/h in 29.6 seconds (as is the case of Peugeot 309 GTI) could not conceivably
outrank a car which does it in 28 (as Honda Civic does) whatever the evaluations
on the other criteria might be. Of course, setting the veto threshold to 1.5 implies
that a car needing 30.4 seconds (like Mazda 323) may not outrank a car that
accelerates in 28.9 (like Opel Astra or Renault 21) but might very well outrank
a car that accelerates in 29 (like Nissan Sunny) if the performances on the other
criteria are superior. Using 1.5 as a veto threshold thus implies that differences
of at least 1.5 from 28 to 29.6 or from 28.9 to 30.4 have the same consequences
in terms of preference. Setting the value of the veto threshold obviously involves
some degree of arbitrariness; why not set the threshold at 1.4 second, which would
imply that Mazda 323 may not outrank Nissan Sunny? In such cases, it must be
verified whether small variations around the chosen value of a parameter (such as
a veto threshold) do not influence the conclusions in a dramatic manner; if small
variations do have a strong influence, detailed investigation is needed in order to
decide which setting of the parameters value is most appropriate. A related facet
of using thresholds is that growing differences that are initially not significant,
brutally crystallise into significant ones as soon as a crisp threshold is passed; ob-
viously methods using thresholds may show discontinuities in their consequences
and that is why sensitivity analysis is even more crucial here than with more clas-
sical methods. However, the underlying logic is quite similar to that on which
statistical tests are based; here as well, conventional levels of significance (like the
famous 5% rejection intervals) are widely used to decide whether a hypothesis
must be rejected or not. We will allude in the next section to more gradual
methods that can be designed on the basis of concordance-discordance principles
similar to those outlined above.
In order not to be too long we do not develop the consequences of introducing
veto thresholds in our example. It suffices to say that the outranking relation, its
kernel and the derived rankings are not dramatically modified in the present case.
if the profile of the pair (a, b) is one of those associated with a particular
value of credibility of outranking, then the outranking of b by a is assigned
this value of credibility index; there are of course rationality requirements
for the sets of profiles associated with the various values of the credibility
index; this credibility index is to be interpreted in logical terms; it models
the degree to which it is true that there are enough arguments in favour of
saying that a is better than b while there is no strong reason of refuting this
statement (see the definition of outranking in Section 6.4.2);
thresholds may be used to determine the classes in differences for preference
on each criterion, provided differences gi (a) gi (b) equal to such thresh-
olds have the same meaning independently of their location on the scale of
criterion i (linearity property);
the rules for determining whether a outranks b (eventually to some degree of
a credibility index) generally involve weights that describe the relative impor-
tance of the criteria; these weights are typically used additively to measure
the importance of coalitions of criteria independently of the evaluations of
the alternatives.
The result of the construction, i.e. the outranking relation (possibly qualified
with a degree of a credibility index), is then exploited in view of a specific type of
decision problems (choice, ranking, . . . ). It is supposed to include all the relevant
and sure information about preference that could be extracted from the data and
the questions answered by the decision-maker.
Due to their lack of transitivity and acyclicity, procedures are needed to derive
a ranking or a choice set from the outranking relation. In the process of deriving
a complete ranking from the outranking relation, the property of independence
of irrelevant alternatives (see Chapter 2 where this property is evoked) is lost;
this property was satisfied in the construction of the outranking relation since
outranking is decided by looking in turn at the profiles of each pair of alternatives,
independently of the rest. Since this is an hypothesis of Arrows theorem and it is
violated, the conclusion of the theorem is not necessarily valid and one may hope
that there is no criterion playing the role of dictator.
The various procedures that have been proposed for exploiting the outrank-
ing relation (for instance transforming it into a complete ranking) are not above
criticism; it is especially difficult to justify them rigorously since they operate on
an object that has been constructed, the outranking relation. Since the decision-
maker has no direct intuition of this object, one can hardly expect to get reliable
answers when questioning him about the properties of this relation. On the other
hand, a direct characterisation of the ranking produced by the exploitation of an
outranking relation seems out of reach.
Non-compensation
The weights count entirely or not at all in the comparison of two alternatives; the
smaller or larger difference in evaluations between alternatives does not matter
once a certain threshold is passed. This fact, which was discussed in the second
6.4. OUTRANKING METHODS 141
if Dj (a, b) c(a, b) j
c(a, b)
S(a, b) = Q 1Dj (a,b)
c(a, b) j:Dj (a,b)>c(a,b) 1c(a,b) otherwise
attention, comparable to what was needed to build value functions from the eval-
uations, was paid to building concordance and discordance indices; in particular,
nothing guarantees that these indices can be combined by means of arithmetic
operations and produce an overall index S representative of a degree of credibility
of an outranking. For instance, consider the following two cases which lead to an
outranking degree of .4:
the concordance index c(a, b) is equal to .40 and there is no discordance (i.e.
Dj (a, b) = 0 for all j);
the concordant coalition weighs .80 but there is a strong discordance on
criterion 1; D1 (a, b) = .90 while Dj (a, b) = 0 for all j 6= 1.
For both, the formula yields a degree of outranking of .40. Obviously another
formula with similar heuristic behaviour might have resulted in quite different
outputs. Consider for instance the following:
On the first case, it yields an outranking degree of .40 as well but on the second
case, the degree falls to .10. It is likely that in some circumstances a decision-maker
might find the latter model more appropriate. Note also that the latter formula
does not involve arithmetic operations on c(a, b) and the 1 Dj (a, b)s but only
ordinal operations, namely taking the minimum. This means that transforming
c(a, b) and the 1 Dj (a, b)s by an increasing transformation of the [0, 1] interval
would just amount to transforming the original value of S(a, b) by the same trans-
formation. This is not the case with the former formula. Hence, if the information
content of the c(a, b) and the 1 Dj (a, b)s just consists in the ordering of their
values in the [0, 1] interval, then the former formula is not suitable. For a survey
of possible ways of aggregating preferences into a valued relation, the reader is
referred to chapters 2 and 3 of the book edited by Slowinski (1998).
The fact that the value obtained for the outranking degree may involve some de-
gree of arbitrariness did not escape Roy and Bouyssou (1993) who explain (p.417)
that the value of the degree of outranking obtained by a formula like the above
should be handled with care; they advocate that thresholds be used when com-
paring two such values: the outranking of b by a can be considered to be more
credible than the outranking of d by c only if S(a, b) is significantly larger than
S(c, d). We agree with this statement but unfortunately it seems quite difficult to
assign a value to a threshold above which the difference S(a, b) S(c, d) could be
claimed as significant.
There are thus two directions that can be followed for taking the objections to
the formula of ELECTRE III into account. In the first option, one considers that
the meaning of the concordance and discordance degrees is ordinal and one tries to
determine a family of aggregation formulae that fulfil basic requirements including
compatibility with the ordinal character of concordance and discordance. The
other option consists in revising the way concordance and discordance indices are
constructed in order to have a quantitative meaning that allows to use arithmetic
operations for aggregating them. That is, at least tentatively, the option followed
144 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES
in the PROMETHEE methods (see Brans and Vincke (1985) or Vincke (1992);
these methods may be interpreted as aiming towards building a value function
on the pairs of alternatives; this function would represent the overall difference in
preference between any two alternatives. The way that this function is constructed
in practice however, leaves the door open to remarks analogous to those addressed
to the weighted sum in Section 6.2.
Besides the above points that are specific to multiple criteria preference models,
more general lessons can also be drawn.
If we consider our trip from the weighted sum to the additive multi-attribute
value model in retrospect, we see that much self-confidence and therefrom
much convincing power can be gained by eliciting conditions under which
an approach such as the weighted sum would be legitimate. The analysis
is worth the effort because precise concepts (like trade-offs and values) are
sculptured through analysis that also results in methods for eliciting the
parameters of the model. Another advantage of theory is to provide us
with limits, i.e. conditions under which a model is valid and a method is
applicable. From this viewpoint and although the outranking methods have
not been fully characterised, it is worth noticing that their study has recently
made theoretical progress (see e.g. Arrow and Raynaud (1986), Bouyssou
and Perny (1992), Vincke (1992), Fodor and Roubens (1994), Tsoukias and
Vincke (1995) , Bouyssou (1996), Marchant (1996), Bouyssou and Pirlot
(1997)), Pirlot (1997)) .
An advantage of formal models that could not be overemphasised is that
they favour communication. In the course of the decision process, the con-
struction of the model requires that pieces of information, knowledge and
priorities that are usually implicit or hidden, be brought into light and taken
into account; also, the choice of the model reflects the type of available in-
formation (more or less certain, precise, quantitative). The result is often
a synthesis of what is known and what has been learnt about the decision
problem in the process of elaborating the model. The fact that a model is
formal also allows for some sort of calculations; in particular, testing to what
extent the conclusions are stable when the evaluation of imprecise data are
varied is possible within formal models. Once a decision has been made, the
model does not lose its utility. It can provide grounds for arguing in favour
or against a decision. It can be adapted to make ulterior decisions in similar
contexts.
The decisiveness of the output depends on the richness of the infor-
mation available. If the knowledge is uncertain, imprecise or simply non-
quantitative in nature, it may be difficult to build a very strong model;
by strong, we mean a model that clearly suggests a decision as, for in-
stance, those that produce a ranking of the alternatives. Other models
(and especially those based on pairwise comparisons of alternatives and ver-
ifying the independence of irrelevant alternatives property) are not able
structurallyto produce a ranking; they may nevertheless be the best possi-
ble synthesis of the relevant information in particular decision situations. In
any case, even if the model leads to a ranking, the decision is to be taken by
the decision-maker and it is not in general an automatic consequence of the
model (due for instance to imprecisions in the data that calls for a relativi-
sation of the models prescription). As will be illustrated in greater detail in
Chapter 9, the construction of a model is not all of the decision process.
7
DECIDING AUTOMATICALLY:
THE EXAMPLE OF RULE BASED
CONTROL
7.1 Introduction
The increasing development of automatic systems in most sectors of human ac-
tivities (e.g. manufacturing, management, medicine, etc.) has progressively led
to involving computers in many tasks traditionally reserved to humans, even the
more strategic ones such as control, evaluation and decision-making. The main
function of automatic decision systems is to act as a substitute for humans (deci-
sion makers, experts) in the execution of repetitive decision tasks. Such systems
can be in charge of all or part of the decision process. The main tasks to be per-
formed by automatic decision systems are collecting information (e.g. by sensors),
making a diagnosis of the current situation, selecting relevant actions, executing
and controlling these actions. Automatisation of these tasks requires the elabo-
ration of computational models able to simulate human reasoning. Such models
are, in many respects, comparable to those involved in the scientific preparation
of human decisions. Indeed, deciding automatically is also a matter of representa-
tion, evaluation and comparison. For this reason, we introduce and discuss some
very simple techniques used to design rule-based decision/control systems. This is
one more opportunity for us to address some important issues linked to descrip-
tive, normative and constructive aspects of mathematical modelling for decision
support:
147
148 CHAPTER 7. DECIDING AUTOMATICALLY
These three points show how the use of formal models and the analysis of the
mathematical properties of the models are crucial in automatic decision-making.
In this respect, the modelling exercise discussed here is comparable to those treated
in the previous chapters, concerning human decision-making, but includes spe-
cial features due to the automatisation (stable pre-existing knowledge and pref-
erences, real-time decision-making, closed system completely autonomous, etc.).
We present a critical introduction to the use of simple formal tools such as fuzzy
sets and rule-based system to model human knowledge and decision rules. We also
make explicit multiple criteria aggregation problems arising in the implementation
of these rules and discuss some important issues linked to rule aggregation.
For the sake of illustration, we consider two types of automatic decision Systems
in this chapter:
decision systems based on explicit decision rules: such systems are used in
practical situations where the decision-maker or the expert is able to make
explicit the principles and rules he uses to make a decision. It is also assumed
that these rules constitute a consistent body of knowledge, sufficiently ex-
haustive to reproduce, predict and explain human decisions. Such systems
are illustrated in section 7.2 where the control of an automatic watering sys-
tem is discussed, and in section 7.4 where a decision problem in the context
of the automatic control of a food process is briefly presented. In the first
case, the decision problem concerns the choice of an appropriate duration for
watering, whereas in the second case, it concerns the determination of oven
settings aimed at preserving the quality of biscuits.
decision systems based on implicit decision rules: such systems are used in
practical applications for which it is not possible to obtain explicit decision
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 149
rules. This is very frequent in practice. The main possible reasons for it are
the following:
Such systems are illustrated in section 7.3, also in the context of the auto-
matic control of food processes. We will use the problem of controlling the
biscuit quality during baking as an illustrative case where numerical deci-
sion models based on pattern matching procedures can be used to perform
a diagnosis of disfunction and a regulation of the oven, without any explicit
rule.
if X is A and Y is B then Z is C
where the X and Y variables are used to describe the current decision context
(input variables) and Z is a variable representing the decision (output variable).
Whenever X and Y can be automatically observed by the decision system (e.g.
using sensors), human skill and experience in problem solving can be approximated
and simulated using the fuzzy control approach (see e.g. Nguyen and Sugeno
1998). Such an approach is based on the use of fuzzy sets and multiple criteria
aggregation functions. Our purpose is to emphasise the interest as well as the
difficulty of resorting to such formal notions on real practical examples.
150 CHAPTER 7. DECIDING AUTOMATICALLY
the gardener both in diagnosis steps (evaluation of the current situation) and in
decision-making steps (choice of an appropriate action). A common way to achieve
this task is to elicit decision rules from the gardener using a very simple language,
as close as possible to the natural language used by the gardener to explain his
decision. For instance, we can use propositional logic and define rules of the
following form:
If T is A and M is B then W is C
where T and M are descriptive variables used for temperature and soil moisture, W
is an output variable used to represent the decision and A, B, C are linguistic values
(labels) used to describe temperature, moisture and watering time respectively.
For example, suppose the gardener is able to formulate the following empirical
decision rules:
Notice that the elicitation of such rules is usually not straightforward, even if it
is the result of a close collaboration with experts in that domain. Indeed, general
rules used by experts may appear to be partially inconsistent and must often
include explicit exceptions to be fully operational. Even without any inconsistency,
the individual acceptance of each rule is not sufficient to validate the whole set
of rules. In some situations, unsuitable conclusions may appear, resulting from
several inferences due to the coexistence of apparently reasonable rules. This
makes the validation of a set of rules particularly difficult. Even in the case of
control rules where there is no need for chaining inferences (we assume here that
the rules directly link inputs (observations) to outputs (decisions)), structuring
152 CHAPTER 7. DECIDING AUTOMATICALLY
the expert knowledge so as to obtain a synthesis of the expert rules in the form
of a decision table (table linking outputs to inputs) requires a significant effort.
We will show alternative approaches that do not require the explicit formulation
of decision rules in Section 7.3.
Now, assuming that the above set of decision rules has been obtained, the
problem is the following: suppose the current air temperature and soil moisture
are known, how can a watering time be computed from these sentences, in other
words how can f be defined so as to properly reflect the strategy underlying these
rules? Some partial answers could be obtained if we could define a formal relation
linking the various labels occurring in the decision rules and the physical quantities
observable by the system. We can observe that the decision rules are expressed
using only three variables, i.e. the air temperature T , the soil moisture M , and
the watering time W . Moreover, they all take the following form:
either if T is Ti then W is Wk
or if T is Ti and M is Mj then W is Wk
The possible labels Ti , Mj and Wk for temperature, moisture and watering
time are given by the sets Tlabels, Mlabels and Wlabels respectively:
Tlabels = {Cold, Cool, Warm, Hot}. These labels can be seen as different
words used to specify different areas on the temperature scale.
Mlabels = {Low, Medium, High}. These labels can be seen as words used to
specify different areas on the moisture scale
Wlabels = {Zero, VeryShort, Short, Medium, Long,VeryLong}. These labels
can be seen as different words used to specify different areas on the time
scale
Using these labels, the rules can be synthesised by the following decision table
(see Table 7.1):
This decision table represents a symbolic function F linking Tlabels and Mla-
bels to Wlabels (Wk = F (Ti , Mj )). Now, we need to produce a numerical trans-
lation of function F in order to construct a numerical function f called transfer
function, whose role is to compute a watering time w from any input (t, m). To
build such a function, the standard process consists in the following stages:
2. activate the relevant decision rules for the current state (inference),
3. synthesise the recommendations induced from the rules and derive a numer-
ical output (decision)
The diagnosis stage consists in identifying the current state of the system using
numerical measures and describing this state in the language used by the expert
to express his decision rules. The inference stage consists of an activation of the
rules whose premises match the description of the current state. The decision
stage consists of a synthesis of the various conclusions derived from the rules and
the selection of the most appropriate action (at this stage, the selected action is
precisely defined by numerical output values). Thus, the definition of the decision
function f relies on a symbolic translation of the initial numerical information in
the diagnosis stage, a purely symbolic inference implementing the usual decision-
making reasoning and then a numerical translation of the conclusions derived
from the rules. The symbolic/numerical translation possibly includes the subjec-
tivity of the decision-maker (perceptions, beliefs, etc), both in the diagnosis and
decision stages. For example, in the gardener example, the subjectivity of the
decision maker is not only expressed in choosing particular decision rules, but also
in linking input labels (T labels and M labels) to observable values chosen on the
basis of the temperature and moisture scales. In the decision step, the expert or
decision-makers subjectivity can also be expressed by linking output labels (Wla-
bels) with elements of the time scale. There are several ways of establishing the
symbolic/numeric translation first in the diagnosis stage and then in the decision
stage. In both stages, symbols can be linked to scalars, intervals or fuzzy sets,
depending of the level of sophistication of the model. In the following subsections,
we present the main basic possibilities and discuss the associated representation
and aggregation problems.
t 30 25 20 30 25 20 30 25 20 10 10 10
m 10 10 10 20 20 20 30 30 30 10 20 30
w 60 35 35 35 20 20 20 10 5 0 0 0
Hence, the transfer function f linking watering time w to the pair (t, m) is
known for a finite list of cases and must be extrapolated to the entire range of
possible inputs (t, m). This leads to a well-known mathematical problem since
function f must be defined so as to interpolate points of type (t, m, w) where
w = f (t, m). Of course, the solution is not unique and some additional assumptions
are necessary to define precisely the surface we are looking for. There is no space
in this chapter to discuss the relative interest of the various possible interpolation
methods that could be used to obtain f . The simplest method is to perform a linear
interpolation from the reference points given in Table 7.5. This implies averaging
the outputs associated to the reference points located in the neighbourhood of the
observed parameters (t, m). For instance, if the observation is (t, m) = (29, 16) the
neighbourhood is given by 4 reference points obtained from rules R1 , R2 , R4 , and
R5 . This yields points P1 = (30, 10), P2 = (25, 10),P4 = (30, 20), and P5 = (25, 20)
with the respective weights 0.32, 0.08, 0.48, 0.12, weight ij of point (xi , yj ) being
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 155
defined by:
|29 xi | |16 yj |
(7.1) ij = 1 1
30 25 20 10
The watering times associated to points P1 , P2 , P4 and P5 are 60, 35, 35, 20 and
therefore, the final time obtained by a weighted linear aggregation is 41 minutes
and 12 seconds. Performing the same approach for any possible input (t, m) leads
to the following piecewise linear approximation of function f , see Figure 7.2.
If (t, m) = (29, 16), then the associated labels are {Hot, M edium} and there-
fore, the only active rule is R4 whose conclusion is watering time is long. Thus,
if we keep the interpretation of long given in Table 7.4 the numerical output is
35.
This process is simple but has serious drawbacks. The granularity of the lan-
guage used to describe the current state of the system is poor and many signif-
icantly different states are seen as equivalent. This is the case, for example, of
the two inputs (17.5, 15) and (22.4, 24.9) that both translate as (Cool, M edium).
On the contrary, for some other pairs of inputs that are very similar, the trans-
lation diverges. This is the case of (17.4, 14.9) and (17.5, 15) that respectively
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 157
give (Cold, Low) and (Cool, M edium). In the first case, rule R10 is activated and
a zero watering time is decided. In the second case, rule R6 is activated and a
medium watering time is recommended, 20 minutes according to Table 7.4. Such
discontinuities cannot really be justified and make the output f (t, m) arbitrarily
sensitive to the inputs (t, m). This is not suitable because such decision systems
are often included in a permanent observation/reaction loop. Suppose for example
that several consecutive situations of temperature and moisture in a stable situa-
tion yield different values for parameter t and m due to the imperfection of gauges
and that these variations occur around a point of discontinuity in the system.
This can produce alternated sequences of outputs such as Short, Zero, Medium,
Zero, leading to alternate starts and stops of the system, and possibly leading to
dysfunctions.
It is true that narrowing the intervals and multiplying the labels would reduce
these drawbacks and refine the granularity of the description, but the number
of rules necessary to characterise f would grow significantly with the number of
labels. Expressing so many labels and rules requires a very important cognitive
effort that cannot reasonably be expected from the expert. Nevertheless, reducing
discontinuity induced by interval boundaries without multiplying labels is possible.
A first option for this is allowing for overlap between consecutive intervals, as
shown below.
In order to improve on the previous solution, we have to specify the links between
the values of physical variables describing the system and the symbolic labels used
to describe the current state of the system more carefully. Since it is difficult
to separate such intervals with precise boundaries, one can make them partially
overlap. As a consequence, in some intermediary areas of the temperature scale,
two consecutive labels are associated to a given temperature, reflecting the possible
hesitation of the gardener in the choice of a unique label. Typically, if Warm and
Hot are represented by intervals [20, 30] and [25, +) respectively, 29o C becomes
a temperature compatible with the two labels. More precisely, from 20o C to
25o C, Warm is a valid label (a possible source of rule activation) but not Hot,
from 25o C to 30o C both labels are valid, and from 30o C, hot is valid but not
warm. This progressive transition between the two states warm and hot refines
the initial sharp transition from warm to hot by introducing an intermediary state
corresponding to an hesitation between the two labels. This is more realistic,
especially because there is no reasonable way of separating the warm and hot
with a precise boundary. Note however that measuring a temperature of 29o C
possibly allow several rules to be active in the same time. This raises a new
problem since these rules may possibly conclude to diverging recommendations
from which a synthesis must be derived. Any output label (labels Wk in the
example) must be translated by numbers and these numbers must be aggregated
to obtain the numerical output of the system (the value of w in the example). Thus,
the definition of a numerical output can be seen as an aggregation problem, where
aggregation is used to interpolate between conflicting rules. As an illustration, we
158 CHAPTER 7. DECIDING AUTOMATICALLY
assume now that the labels are represented by the intervals given in Tables 7.8
and 7.9:
Hence, each watering time activated by at least one rule receives the weight 1
and any other time receives the weight 0. For example, with the observation
(t, m) = (29, 16), we have seen that the active rules are R1 , R2 , R4 and R5 and
therefore (R1 ) = (R2 ) = (R4 ) = (R5 ) = 1 whereas (R) = 0 for any
other rule R. Let us now present in detail the calculation of (35). Since 35
(minutes) is the scalar translation of Long, we obtain from the gardeners rules
B(35) = {R2 , R3 , R4 }. Hence (35) = sup{(R2 ), (R3 ), (R4 )} = 1. Similarly
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 159
Coming back to the example, we now obtain: (60) = (R2 )+ (R3 )+ (R4 ) =
2 whereas the others () remain unchanged. This second option gives more im-
portance to a time supported by several rules than to a time 0 supported by
a single rule. Everything works as if each active rule was voting for a time. The
more a given time is supported by the set of active rules, the more it becomes
important in the calculation of the final watering time. The option (7.3) could
be preferred when the activation of the various rules are independent. On the
contrary, when the activation of a subset of rules necessarily implies that another
subset of rules is also active, one could prefer resorting to (7.2) so as to avoid pos-
sible overweighing due to redundancy in the set of rules. In a practical situation,
one can easily imagine that the choice of one of these options is not easy to justify.
Since there is a finite number of rules, there is only a finite number of times
activated by the rules in a given state. In order to synthesise these different times,
the most popular approach is the centre of gravity method which amounts to
performing a weighted sum (see also chapter 6) of all possible times . Formally
the final output is defined by:
P
().
(7.4) w = P
()
From the observation (t, m) = (29, 16), equations (7.2) and (7.4) yield a water-
ing time of (60 + 35 + 20)/3 yielding 38 minutes and 20 seconds, whereas equation
(7.3) yields: w = 0.25 (60 + 35 + 35 + 20) that amounts to 37 minutes and 30
seconds. Note that the choice of a weighted sum as final aggregator in equation
(7.4) is questionable and one could formulate criticisms similar to those addressed
to the weighted average in the previous chapters (especially in chapter 6).
This process is perhaps the more elementary way of using a set of symbolic decision
rules to build a numerical decision function. It shows a simple illustration of the
so-called computing with words paradigm advocated by Zadeh (see Zadeh 1999).
The main advantages of such a process are the following:
Example (1). Consider two very similar states s1 and s2 characterised by the
observations (t, m) = (25.01, 19.99) and (t, m) = (24.99, 20.01). According to Ta-
bles 7.8 and 7.9, state s1 makes valid the labels {W arm, Hot} for temperature,
and {Low, M edium} for soil moisture. This activates rules R1 , R2 , R4 and R5
whose recommendations are VeryLong, Long, Long, Medium respectively. The
resulting watering time obtained by equation (7.4) is therefore 38 minutes and
45 seconds. Things are really different for s2 however. The valid labels are
{Cool, W arm} for temperature, and {M edium, High} for soil moisture. This
activates rules R5 , R6 , R8 and R9 whose recommendations are Medium, Medium,
Short, VeryShort respectively. The resulting watering time obtained by equation
(7.4) is therefore 13 minutes and 45 seconds. It is worth noting that, despite the
close similarity between states s1 and s2 , there is a significant difference in the wa-
tering times computed from the two input vectors. This is due to the discontinuity
of the transfer function that defines the watering time from the input (t, m) for
(t, m) = (25, 20). In the right neighbourhood of this entry (t > 25 and m < 20),
the decision rules R1 , R2 and R4 are fully active but this is no longer the case in
the left neighbourhood of the point (t < 25 and m > 20) where they are replaced
by rules R6 , R8 and R9 , thus leading to a much shorter time. The activations
and computations performed for s1 and s2 differ significantly. They lead to very
different outputs, despite the similarity of the states.
This criticism is serious, but the difficulty can partly be overcome. It is true
that, depending on the choice of the numerical encoding of the labels, the numer-
ical outputs resulting from the decision rules may vary significantly. Since the
numerical/symbolic and then symbolic/numerical translations are both sources of
arbitrariness, the following question can be raised: why not usenumbers directly?
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 161
There are two partial answers: first, in many decision contexts, the possibility
of justifying decisions is a great advantage. Although this is not crucial in our
illustrative example, the ability of automatic decision systems to simulate human
reasoning and explain decision by rules is generally seen as an important advan-
tage. This argument often justifies the use of rule-based systems to automatise
decision-making, even if each decision considered separately is of marginal impor-
tance. Second, there are several ways of improving the process proposed above
and of refining the formal relationship between qualitative labels and numerical
values. It is not our purpose to cover all possibilities in detail. We only present and
discuss some very simple and intuitive ideas used to construct more sophisticated
models and tools in this context.
Note that sometimes, the fuzzy labels are defined in such a way that member-
ship adds up to 1 for any possible value of the numerical parameter. This is the
case of labels defined in figure 7.4 for which we have:
if T is Ti and M is Mj then W is Wk
where Wk = F (Ti , Mj ), and for any numerical observation (t, m), the weight (or
activation degree) ij of the rule Rij reflects the importance (or relevance) of the
rule in the current situation. This importance depends on the matching of the
input (t, m) and the premise (Ti , Mj ). It is therefore natural to state:
where h is an aggregation function representing the logical and used in the rule,
e.g. h(x, y) = min(x, y).
Table 7.10: The weights of the rules when (t, m) = (29, 16)
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 163
Hence, using equation (7.4) and Table 7.4, we get w(s1 ) = 20 minutes and
5 seconds as the final output. Similarly, for state s2 , the activation of the rules
obtained from equation (7.6) are only slightly different from those for s1 and
the final output derived from Table 7.12 using equation (7.4) gives w(s2 ) = 19
minutes and 58 seconds. Here, we notice that the activity of each rule does not
vary significantly when passing from state s1 to state s2 . This is due to the
way activation weights are defined and used in the process. These weights depend
continuously on input parameters t and m, and the membership functions defining
the labels have soft variations. As a consequence, since the aggregation function
164 CHAPTER 7. DECIDING AUTOMATICALLY
used to derive the final watering time w is also a continuous function of quantities
(R) (see equation (7.4)), quantity w depends continuously on input parameters t
and m. This explains the observed improvement with respect to the previous model
based on the use of all or nothing activation rules. Thus, the use of fuzzy labels
to interpret input labels has a significant advantage: it makes it possible to define
a continuous transformation of numerical input data (temperature, moisture) into
symbolic variables used in decision rules. The resulting decision system is more
realistic and robust to slight variations of inputs. This advantage is due to the
use of fuzzy sets and has greatly contributed to the practical success of the fuzzy
approach in automatic control (fuzzy control, (see e.g. Mamdani 1981, Sugeno
1985, Bouchon 1995, Gacogne 1997, Nguyen and Sugeno 1998). However, several
criticisms can be addressed to the small fuzzy decision module presented above.
Among them, let us mention the following:
the choice h = min in equation (7.6) requires that quantities of type Ti (t)
and Mj (m) are commensurate. This assumption, which is rarely explicit,
is very strong because it requires much more than comparing the relative
fit of two temperatures (resp. two moistures) to a Label Ti (resp. Mj ).
It also requires comparing the fit of any temperature to any label Ti with
the fit of any moisture to any label Mj . A perfectly sound definition of
such membership values would require more information than can easily be
obtained in practice. Moreover, the choice of min is often justified by the fact
that h is used to evaluate a conjunction between several premises of a given
rule (a conjunction of type temperature is Ti and moisture is Mj ). Note
however that the idea of the conjunction is captured by any other t-norm (see
for instance, Fodor and Roubens (1994)). Thus, the product could perhaps
replace the min and the particular choice of the min is not straightforward.
This is problematic because this choice is not without consequence on the
definition of the watering time.
represented by fuzzy intervals of the time scale. For the sake of illustration, we let
us consider the labels represented in Figure 7.5.
For any state (t, m) of the system, the range of relevant watering times is the
union of all values compatible with labels Wk derived from active rules. In the
example, the active rules are R1 , R2 , R4 , R5 , and therefore the Wlabels concerned
are Medium, Long and VeryLong. Hence the set of relevant watering times
is [10, 70]. However, all times are not equivalent inside this set. Each of them
represents a possible numerical translation of a label Wk obtained by the acti-
vation of one or several rules. To be fully considered, a time must be perfectly
representative of a label Wk that has been obtained by a fully active rule. In more
nuanced situations, the weight attached to a possible time is function of the fitness
of the times activated to a certain degree by the rules. For example, by analogy
with Mamdanis approach to fuzzy control (Mamdani 1981), the weight of any
watering time can be defined by:
where B represents the set of rules (here the gardeners rules) and Rij represents
the rule:
If T = Ti and M = Mj then W = Wk
and h is a non-decreasing function of its arguments (in Mamdanis approach,
h = min). The idea in equation (7.7) is that a watering time must receive
an important weight when there is at least one rule Rij whose premises (Ti , Mj )
are valid for the observation (t, m) and whose conclusion Wk is compatible with
. This explains that t,m () is defined as an increasing function of quantities
Ti (t), Mj (m) and Wk (). Notice that equation (7.7) is a natural extension
of equation (7.2). In our example, the observation (t, m) = (29, 16) leads to a
function 29,16 (w) represented in Figure 7.6.
In order to obtain a precise watering time, we can use an equation similar to
(7.4). However, this equation must be generalised because there may be an infinity
of times activated by the rules (e.g. a whole interval). The usual extension of the
weighted average to an infinite set of values is given by the following integral:
R
t,m () d
(7.8) w= R
t,m () d
166 CHAPTER 7. DECIDING AUTOMATICALLY
This last sophistication meets our objective because it provides a transfer func-
tion f with good continuity properties. However, the use of equations (7.77.9)
can be seriously criticised:
In the fields of multi-valued logic and fuzzy sets theory, admissible functions
used to translate implications are required to be non-increasing with re-
spect to the value of the left hand-side of the implication and non-decreasing
with respect to the value of the right hand-side (Fodor and Roubens 1994,
Bouchon 1995, Perny and Pomerol 1999). As an example the value attached
to the sentence A implies B can be defined by the Lukasiewicz implication
min(1 v(A) + v(B), 1) where v(A) and v(B) are the values of A and B
respectively. In our case, the conjoint use of the min operator to interpret
the conjunction on the left hand-side and that of the Lukasiewicz implication
would lead to the following h function:
Note that this function is not increasing in its arguments, as required above in
the text. However, resorting to implication operators instead of conjunctions
in order to implement an inference via rule Rij also seems legitimate. This
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 167
the support, i.e. the interval of all numerical values compatible with
the label, their membership must be strictly positive,
the core, i.e. the interval of all numerical values perfectly representative
of the label (the core is a subset of the support), their membership is
equal to 1,
the membership function making a continuous transition from the bor-
der of the support to the border of the kernel.
For example, the label Long in Figure 7.5 is defined by support [20, 55], core
[30, 40] and two linear transitions (membership to non-membership) in the
range [20, 30] [40, 55]. One could expect that the decision-maker is able to
specify the support and core of each fuzzy label, as well as the trend of the
membership function (increasing from the border of the support to the border
of the core). Even with this information, however, the choice of a precise
membership function often remains arbitrary. The above information leaves
room for an infinity of functions. In practice, the shape of the membership
function in the transition area is often chosen as linear or gaussian (for
derivability) but rarely justified by questioning the decision-maker. Thus,
in many cases, the only reliable information contained in the membership
function is the relative adequation of each temperature, moisture, time, to
each label. For example, Long (21) = 0.1 and Long (25) = 0.5 only means
that 25 minutes is a better numerical translation of the qualifier Long than
21 minutes. This does not necessarily mean that 25 minutes is more Long
than 30 minutes is Medium, even if M edium (30) = 0.4, nor that 25 minutes
is more Long than 26o C is Hot, even if Hot (26o ) = 0.2. However, without
such assumptions, the definition of weights t,m () in equation (7.8) with
h = min is difficult to justify.
168 CHAPTER 7. DECIDING AUTOMATICALLY
Bearing in mind that the weights t,m are used as cardinal weights in (7.4)
while they are defined from membership values Ti (t), Mj (m), and Wk (),
the membership values should have a cardinal interpretation. This is one
more very strong hypothesis. For example, we need to consider that 25
minutes is 5 times better than 21 minutes to represent long, because the
membership value is 5 times larger. Even when the commensurability as-
sumption of membership scales is realistic, the weights cannot necessarily
be interpreted as cardinal values and the weighted aggregation proposed in
equation (7.8) is questionable.
As an illustration of the latter, consider the following example showing the im-
pact of an increasing transformation of membership values on the output watering
time:
Example (2). Consider the two following input vectors i1 = (29, 29) and i2 =
(18, 16). These two inputs lead to activation weights given in Tables 7.13 and
7.14. Then, for the sake of simplicity, we use the non-fuzzy labels given in Table
7.4 for interpretation of labels Wk . Then, assuming we use equations (7.2) and
(7.4) to define the watering time w, we obtain the following result: w(i1 ) = 19
minutes and 33 seconds and w(i2 ) = 21 minutes and 40 seconds. Notice that
the times as not so different, despite the important difference between inputs i1
and i2 . This can be easily explained by observing that, in the second case, the
temperature is lower, but the soil water content is also lower, and the two aspects
compensate each other. Now, we transform all membership functions of the labels
by the function (x) = 3 x. This preserves the support and the core of each label,
as well as the slope (increasing or decreasing) of membership functions. In fact,
it represents the same ordinal information about membership degrees. However,
the activation tables are altered as shown in Tables 7.15 and 7.16. This gives the
following watering times: w(i1 ) = 20 minutes and 34 seconds, w(i2 ) = 19 minutes
and 42 seconds. Note that we now have w(i1 ) > w(i2 ) whereas it was just the
opposite before the transformation of membership values.
This example shows that comparison of output values is not invariant to mono-
tonic transformations of membership values and this explains the more than ordi-
nal interpretation of membership values in the computation of w. Although this
inversion of duration is not a crucial problem in the case of the watering system,
it could be more problematic in other contexts. For instance, if we use a similar
system (based on fuzzy rules) to rank candidates in a competition, the choice of
7.2. A SYSTEM WITH EXPLICIT DECISION RULES 169
a particular shape for membership must be well justified because it may really
change the winner.
Another possibility is resorting to other aggregation methods that do not re-
quire the same level of information. Several alternatives to the weighted sum are
compatible with ordinal weights, e.g. Sugeno integrals (see Sugeno 1977, Dubois
and Prade 1987), and could be used advantageously to process ordinal weights.
However, they also have some limitations. They are not as discriminating as the
weighted sum and they cannot completely avoid commensurability problems (see
Dubois, Prade and Sabbadin 1998, Fargier and Perny 1999).
a sensor located in the oven measures the air moisture, within the oven, near
the biscuit line. The evaluation m is given in cg/g (centigrams per one gram
of dry matter) in the range [0, 10] with the desired values being around 4
cg/g.
the red-green axis and a level b on the yellow-blue axis. The desired color is
not easy to specify.
each vector x X the vector (C1 (x), . . . , Cq (x)) giving the membership of x
to each category (e.g possible disfunction of the oven). One of the most popular
classification methods is the so called Bayes rule which is known to minimise the
expected error rate. However, the rule requires knowing the prior and conditional
probability densities of all categories, which is not frequent in practice. When this
information is not available (this is the case in our example) the nearest neighbour
algorithm is very useful. The basic principle of the kNearest Neighbour assign-
ment rule (kNN) introduced in (Fix and Hodges 1951) is to assign an object to
the class to which the majority of its k-nearest neighbours belong.
More precisely, for any sample S X of vectors whose correct assignment is
known, if Nk (x) represents the subset of S formed by the k nearest neighbours of
x within S, the kNN rule is defined for any k {1, . . . , n} by:
P
1 if j = Arg maxi { yNk (x) Ci (y)}
(7.10) Cj (x) =
0 otherwise
where Arg maxi , g(i) represents, the value i for which g(i) is maximal. This
supposes that the maximum is reached for a unique i. When this is not the
case, one can use a second criterion for discriminating between all g-maximal
solutions
P or, alternatively, choose all of them. In equation (7.10), function g(i)
equals yNk (x) Ci (y) and represents the total number of vectors, among the
k-nearest neighbours of x that have been assigned to category i.
It has been proved that the error rate of the kNN rule tends towards the
optimal Bayes error rate when both k and n tend to infinity while k/n tends to 0
(see Cover and Hart 1967). The main drawback of the k N N procedure is that
all elements of Nk (x) are equally weighted. Indeed, in most cases, the neighbours
are not equally distant from x and one may prefer to give less importance to
neighbours very distant from x. For this reason, several weighted extensions of
the kNN algorithm has been proposed (see Keller, Gray and Givens 1985, Bezdek,
Chuah and Leep 1986, Bereau and Dubuisson 1991). For example, the fuzzy kNN
rule proposed by Keller et al. (1985) is defined by:
P Cj (y)
yNk (x) 2
kxyk m1
(7.11) Cj (x) = P 1
yNk (x) 2
kxyk m1
when the weighted arithmetic mean seems convenient, the use of weights linked to
distances of type k x y k and to parameter m is not obvious. Indeed, the norm
of x y is not necessarily a good measure of the relative dissimilarity between the
two biscuits represented by x and y. This is the case, for instance, when units
are different and non commensurate on the various axis. In order to distinguish
between significant and non significant differences on each dimension, one may
include discrimination thresholds (see chapter 6) in the comparison, allowing to
distinguish differences that are significant for the expert from those that are neg-
ligible. This is particularly suitable in the field of subjective evaluation in which
preferences and perceptions of the expert (or decision-maker) are not usually lin-
early related to the observable parameters. For instance, one could define a fuzzy
similarity relation (x, y) as a function of quantities of type k xi yi k for any
attribute i, representing the relative closeness of x and y for the expert. Then, we
can use a general aggregation rule of type:
This is the proposition made in (Henriet 1995), (Henriet and Perny 1996) and
(Perny and Zucker 1999) where the membership of Cj (x) is defined by:
k
Y
(7.13) Cj (x) = 1 (1 (x, yi ).Cj (yi ))
i=1
if |xi yi | qi
1
|xi yi |qi
(7.14) i (x, y) = pi qi if qi < |xi yi | < pi
if |xi yi | pi
0
In the above formula, qi and pi are thresholds (possibly varying with the level
xi or yi ) used to define a continuous transition from full similarity to dissimilarity
as shown in the example given in Figure 7.7. It should be noted however that the
definition of similarity indices i (x, y) is very demanding. It requires assessing
two thresholds for attribute level xi . Moreover the linear transition from similarity
to non-similarity is not easy to justify and a full justification of the shape of the
similarity function i would require a lot of information about difference of type
xi yi . Usually, the construction of such similarity functions is only based on
empirical evidence and common sense principles.
Coming back to the example, the kNN algorithm can be used for periodically
computing two coefficients too hot (x) and not hot enough (x). These coeffi-
cients evaluate the necessity for a regulation action, by analysing the measure x
of the last biscuit. For instance, too hot (x) = 1 and not hot enough (x) = 0
means that decreasing the oven temperature is necessary. The decision process
174 CHAPTER 7. DECIDING AUTOMATICALLY
~ i (x, y)
1
0 yi
x i - pi xi - qi xi + qi x i + pi
is improved if we use the fuzzy version of the kNN algorithm in the diagno-
sis stage. In this case, the values too hot (x) and not hot enough (x) possibly
take any value within the unit interval, and these values can be interpreted as
indicators of the amplitude of the regulation and help the system in choosing a
soft regulation action. The main drawback of this automatic decision process is
the absence of explicit decision rules explaining the regulation actions. This is
not a real drawback in this context because the quality of biscuits is a sufficient
argument for validation. However, in many other decision problems involving an
automatic system, e.g. the automatic pre-filtering of loan files in a bank, the need
for explanations is more crucial, first to validate a priori the system, and secondly
to explain decisions a posteriori to the clients. The use of rules in the context of
baking control is discussed in the next section.
0 m
(cg/g)
3 3.8 4.7 5.8
0 t
28 32 35 38 (mm)
The translation is more difficult for labels used for the biscuit aspect because
the aspect is represented by a fuzzy subset of the 3-dimensional space characterised
by the components (L, a, b). This problem has been solved by the fuzzy kNN
algorithm. It is indeed sufficient to ask an expert in baking control to qualify,
with a label yi each element i of a representative sample of biscuits, using only
the 5 labels introduced to describe aspect. At the same time, the sensors assess
the vector xi = (Li , ai , bi ) describing the biscuit i in the physical space. Then the
fuzzy kNN algorithm is applied with reference points (xi , yi ) for all biscuits i
in the sample. For any input x = (L, a, b) it gives the membership values yj (x)
176 CHAPTER 7. DECIDING AUTOMATICALLY
for any label yj , j {1, . . . , 5} used to describe the biscuits aspect. The fuzzy
nearest neighbour algorithm provides a representation of labels yj , j = 1, . . . , 5 by
fuzzy subsets of the (L, a, b) space. This makes it possible to resort to the fuzzy
control approach presented in section 7.2.
m
too hot (x)
t Diagnosis Decision
x L
Module Module
t
a not hot enough(x)
b
7.5 Conclusion
We have presented simple examples illustrating some basic techniques used to
simulate human diagnosis, reasoning and decision-making, in the context of re-
peated decision problems, convenient for an automatisation. We have shown the
importance of constructing suitable mathematical representation of knowledge and
decision rules. The task is difficult because human diagnosis is mainly based on
human perception whereas sensors naturally give numerical measures, and be-
cause human reasoning is mainly based on words and propositions drawn from the
natural language, whereas computers are basically suited to perform numerical
computations. As shown in this chapter, some simple and intuitive formal mod-
els have been proposed, enabling to establish a formal correspondence between
symbolic and numeric information. They are based on the definition of fuzzy sets
linking labels to observable numerical measures through membership functions.
However, a proper use of these fuzzy sets requires a very careful analysis. Indeed,
we have shown that many apparently natural choices in the modelling process
possibly hide strong assumptions that can turn out to be false in practice. For
instance, small numerical examples given in the chapter show that, in the context
of rule based control systems, the output of the system highly depends on the
choice of numbers used to represent symbolic knowledge. In particular, one must
be aware that multiplying arbitrary choices in the construction of membership
functions can make the output of the system completely meaningless.
7.5. CONCLUSION 177
8.1 Introduction
In this chapter, we describe an application that was the theme of a research col-
laboration between an academic institution and a large company in charge of the
production and distribution of electricity. We do not give an exhaustive descrip-
tion of the work that was done and of the decision-aiding tool that was developed.
A detailed presentation of the first discussions, of the progressive formulation of
the problem, of the assumptions chosen, of the hesitations and backtrackings,
of the difficulties encountered, of the methodology adopted and of the resulting
software would require nearly a whole book. Our purpose is to point out some
characteristics of the problem, especially on the modelling of uncertainties. The
description was thus voluntarily simplified and some aspects, of minor interest in
the framework of this book, were neglected. The main purpose of this presenta-
tion is to show how difficult it is to build (or to improvise) a pragmatic decision
model that is consistent and sound. It illustrates the interest and the importance
of having well-studied formal models at our disposal when we are confronted with
a decision problem. Sections 8.2 and 8.3 present the context of the application
and the model that was established. Section 8.4 is based on a didactical example:
it first illustrates and comments some traditional approaches that could have been
used in the application; then it gives a detailed description of the approach that
was applied in the concrete case. Section 8.5 provides some general comments on
the advantages and drawbacks of this approach.
The company must periodically make some choices for the construction or closure
179
180 CHAPTER 8. DEALING WITH UNCERTAINTY
of coal, gas and nuclear power stations, in order to ensure the production of elec-
tricity and satisfy demand. Due to the diversity of points of view to be taken into
account, the managers of the production department wanted to develop a multiple
criteria approach for evaluating and comparing potential actions. They considered
that aggregating financial, technical and environmental points of view into a type
of generalised cost (see Chapter 5) was neither possible nor very serious. A collab-
oration was established between the company and an academic department (we
will call it the analyst) that rapidly discovered that, beside the multiple criteria
aspect, an enormous set of potential actions, a significant temporal dimension and
a very high level of uncertainty on the data needed to be managed. The next
section points out these aspects through the description of the model as it was
formulated in collaboration with the companys engineers.
Table 8.1: Power and construction delay for the different types of production unit
For simplicity, the decisions are only taken at chosen milestones, separated by
a time period of about 3 years (this period between two decisions is called block ).
At most one unit of each type per year may be ordered, and the choice concerning
the downgrade plan (follow, anticipate or delay) is of course exclusive. A decision
for a block of 3 years could thus be for example
meaning that one nuclear, one coal and two gas production units are planned and
that the downgrade plan has to be anticipated.
8.3. THE MODEL 181
Each decision is irrevocable and naturally has consequences for the future, not
only on the production of electricity, as seen in Table 8.1, but also in terms of
investment, exploitation cost, safety, environmental effects, ... (see Section 8.3.2).
An action is a succession of decisions over the whole time period concerned by
the simulation (the horizon), i.e. a period of about 20-25 years or 7 blocks. An
action is thus for example
{1N, 1C, 2G, A}, {1C}, {2G}, {}, {3G}, {1G, 1C}, {1N, 2G} .
The number of possible actions is of course enormous. Even after adding some
simple rulesonly one (or zero) nuclear units are allowed exclusively on the first and
last block, anticipation and delay are only allowed on the first and second blocks,
an anticipation followed by a delay (or the inverse) is forbiddenthe number of
actions is still of around 108 . Many of these actions are completely unrealistic,
as for example no new unit for 20 years or 3G and 3C in every block: they can
be eliminated by fixing reasonable limits on the power production of the park.
In this problem, the decision-maker only kept the actions so that, for each block,
the surplus is less than 1 000 MW and the deficit be less than 200 MW. These
limitations led to a set of approximately 100 000 potential actions. The temporal
dimension of the problem naturally leads to a tree structure for these actions, built
on decision nodes (represented by squares in Figure 8.1). Depending on the block
considered, there are typically between 3 and 30 branches leaving each decision
node.
marginal cost, i.e. the amount of total cost for a variation of 1 GWh, in BEF,
to minimise;
A : {}, {2G}, {3G}, {2G}, {3G}, {}, {}
B : {1N, 2G, 2C}, {2C, 1G}, {3C}, {2C}, {1N }, {}, {}
A B
Fuel cost 33 500 31 000 MBEF
Exploitation cost 45 000 49 000 MBEF
Investment cost 360 000 770 000 MBEF
Marginal cost 730 620 KBEF/GWH
Deficient power 16.7 10.3 TWH
CO2 emissions 22 000 16 000 Ktons
SO2 + N Ox emissions 70 48 Ktons
Sales Balance 23 000 30 000 MBEF
The evaluations of the actions on these criteria are of course not known with
certainty, because they depend on many factors that are not or not well known by
the decision-maker. The uncertainties have an impact on the evaluations, which
can be direct (the prices of the raw materials influence their total costs) or indirect
(if the gas price increases more than the coal price, the coal power stations will be
more intensively exploited than the gas ones; this will have an impact on the fuel
costs and the environmental impacts of the production park). Table 8.2 presents
an example of evaluations for two particular actions in a scenario where the fuel
price is low and the demand for electricity is relatively weak. Other scenarios must
be envisaged in order to improve the realism and usefulness of the model.
the value is not known: the value is relative to the past and was not measured,
the value is relative to the present but is technically impossible or very
expensive to obtain, the value is relative to the future for a parameter with
a completely erratic evolution;
8.3. THE MODEL 183
the value can be approximated by an interval: the bounds result from the
properties of the system considered, the interval is due to the imprecision of
the measure or to the use of a forecasting method; sometimes, a probability,
a possibility or a confidence index can be associated with each value of the
interval;
the value is not unique: several measures did not yield the same value, several
scenarios are possible; again a probability, a possibility, a confidence index
or the result of a voting process can be associated with each value;
the value is unique but not reliable, with a certain information on the degree
of reliability.
In the particular situation described here, the industrial partner was already
using stochastic programming for the management of the production park. He
wanted to have another methodology in order to take better account of the num-
ber of potential actions and the multiple criteria aspects. For the uncertainties,
however, they were used to working with probabilities and the framework of the
study did not allow to suggest anything else. So, scenarios were defined and subjec-
tive probabilities were assigned to them by the companys experts. More precisely,
two types of uncertainties were distinguished and respectively called aleas and
major uncertainties: the difference between them is based on the more or less
strong dependence between the past and the future. The industrial partner con-
sidered that nuclear availability in the future was completely independent of the
knowledge of the past and called this type of uncertainty alea: this means that
the level of nuclear availability was completely open for each period of three years
(a breakdown at a given time does not imply that there will be no breakdown in
the near future). The selling price of electricity was also considered as an alea
in order to be able to capture the deregulation phenomena due to a forthcoming
new legislation.
The major uncertainties (for which some dependence can exist between the
values at different moments) were the fuel price (the market presents global ten-
dencies and a high price for the first two blocks reinforces the probability of having
a high price for the third one), the demand for electricity (same reasoning) and
the legislation concerning pollution (in this example, the law may change for the
third block, and the uncertain parameters after this block are thus strongly re-
lated: either the same as for the first blocks, or more severe, but in both cases,
constant over all blocks after block 2).
The major uncertainties allow for a learning process that must be taken into
account in the analysis: each decision, at a given time, may use the previous values
of the uncertain parameters and deduce information from them about the future.
This information may modify the choices of the decision-maker. Suppose for in-
stance that a variable x may be equal to 0 or 1 in the future. The corresponding
probabilities are assessed as follows:
184 CHAPTER 8. DEALING WITH UNCERTAINTY
P (x = 0) > 0.5, after past scenario A,
P (x = 0) < 0.5, after past scenario B,
where the past scenario is known at the time of decision. The decision-maker
has to choose between two decisions: a and b. If he prefers a when x = 0 and b
when x = 1, a reasonable decision will be to choose a after scenario A and b after
scenario B.
The previous explanation is not valid for aleas, because their independence
does not allow for direct inference from the past.
Because of the statistical dependence and of the possible learning process in
the major uncertainty case, a complete treatment and a tree-structure for these
scenarios (a scenario is a succession of observed uncertainties) are necessary. If
there are 3 levels for the fuel price, 3 levels for the demand, 2 levels for the
legislation, and if the horizon is divided into 7 blocks, there are, a priori, (3 3
2)7 ' 6108 possible scenarios. Fortunately, most of these scenarios are negligible
because the probability of a very fluctuating scenario is very small: the major
uncertainty scenarios are rather strongly correlated, and a sequence of levels for
the fuel price such as HHLMHLH (H for high, M for medium and L for low) is
much less probable than a sequence HHHMMMM. In practice, two sequences were
retained for legislation (MMMMMMM and MMHHHHH ), it was imposed that
scenarios could only change after two blocks, and each modification was penalised
so that very fluctuating scenarios were hardly possible. The analyst finally retained
around 200 representative scenarios that were gathered in a tree-structure of major
uncertainty nodes (represented by circles in Figure 8.1).
Of course, the complete scenario for a decision node at time t is not known but
a probability is associated to each of them, allowing to compute the conditional
probability of each complete scenario knowing the already observed partial scenario
at time t.
On the contrary, the aleas are by essence uncorrelated and there is no reason
to neglect any scenario. If there are 3 levels for the selling price and 2 levels for the
availability of nuclear units , then the number of scenarios is (3 2)7 = 279 936.
Fortunately, the tree structure of the aleas is obvious: each node gives rise to
the same possibilities, with the same probability distribution. For these reasons,
the aleas act much more simply than the major uncertainties, and it is possible to
take the whole set of scenarios into account.
themselves can be dispersed over rather long periods and vary within these pe-
riods. Fourth, the consequences of a decision can be different according to the
moment that decision is taken. It is rather usual, in planning models, to introduce
a discounting rate that decreases the weight of the evaluations for distant conse-
quences (see Chapter 5) and the industrial partner did this here. However, for a
long term decision problem with important consequences for future generations,
such an approach may not be the best one and the decision-maker could be more
confident in the flexible approach and the richness of the scenarios. That is why
the analyst kept the possibility to introduce discounting or not.
The complete model can be described by a tree structure including decision nodes
(squares) and uncertainty nodes (circles), as illustrated in Figure 8.1. At t = 0
(square node at the beginning of block 1), a first decision is made (a branch is
chosen) without any information on the scenario, leading to a circle node. During
block 1, one may observe the actual values of the uncertain parameters (nuclear
disponibility, electricity selling price, fuel price, electricity demand and environ-
mental legislation), determining one branch leaving the considered circle node and
leading to one of the decision nodes at time t = 1. A new decision is then made,
taking the previous information into account, and so on until the last decision
(square) node and the last scenario (circle) node that determine the whole action
and the whole observed scenario. In the resulting tree (Figure 8.1), the decision
nodes (squares) correspond to active parts of the analysis where the decision-maker
has to establish his strategy, while the uncertainty nodes (circles) correspond to
passive parts of the analysis where the decision-maker undergoes the modifications
of the parameters.
Consider Figure 8.2 describing two successive time periods. At time t = 0, two
decisions A and B are eligible; during the first period, two events S and T are
possible, each with probability 1/2. At the beginning of the second period, two
decisions C and D are eligible if the first decision was A and three decisions E,
F, G are eligible if the first decision was B. During the second period, two events
U and V are possible after S (with respective probabilities 1/4 and 3/4) and two
events Y and Z are possible after T (with respective probabilities 3/4 and 1/4).
Figure 8.2 presents the tree and the evaluation of each action (set of decisions)
for each complete scenario. Remark that this didactic example contains only one
8.4. A DIDACTIC EXAMPLE 187
evaluation for each action (problem with one criterion). We do not insist on the
multiple criteria aspect of the problem here (this was treated in Chapter 6) and
focus on the treatment of uncertainty.
Value
U (1/4) N6 7
C
V (3/4) N7 4.5
N2 D
U (1/4) N8 4.5
S (1/2)
V (3/4) N9 5.5
C
T (1/2) Z (1/4) N11 4.5
N3 D
Y (3/4) N12 1
A
Z (1/4) N13 5
B S (1/2)
U (1/4) N18 1
V (3/4) N19 1
Y (3/4) N20 6
Z (1/4) N21 1
E
T (1/2)
Y (3/4) N22 2
F
N5
G Z (1/4) N23 2
Y (3/4) N24 5
Z (1/4) N25 5
The expected utility model, which is the subject of the next section, allows
to resolve this paradox and, more generally, to take different possible attitudes
towards risk into account.
This model dates back at least to Bernoulli (1954) but the basic axioms, in
terms of preferences, were only studied in the present century (see for instance von
Neumann and Morgenstern 1944).
In the case of the St. Petersburg game, if we denote by u(x) the utility of
winning x e, the expected utility of refusing the game is u(0), while the expected
utility of betting an amount of s e in the game is
X
1/2k u(2k s).
k=1
As an exercise, the reader can verify that for a utility function defined by
the expected utility of betting s e in the game is positive (hence superior to the
expected utility of refusing the game) as long as s is less than 21(1 1/220 ) e,
and is negative for larger values. The expected utility can also be finite with an
unbounded utility function such as, for example, the logarithmic function.
In the example in Figure 8.2 and with a utility function defined by
u(1) = u(2) = 1,
u(3) = u(3.5) = 2,
u(4.5) = u(5) = u(5.5) = 3,
u(6) = u(7) = 4,
S(1/2) N2 D 5.25
A T(1/2) N3 C 4.5
N1
B S(1/2) N4 E 5
T(1/2) N5 G 5
10
S
T
15
A U
20
15
S
B
T
20
U
9
S(1/2) N2 C 13/4
A T(1/2) N3 C 1/2
N1
B S(1/2) N4 E 11/4
T(1/2) N5 E 1/2
W R G
A 100 0 0
B 0 100 0
W R G
C 100 0 100
D 0 100 100
Table 8.3
which is in contradiction with the inequality obtained above. So, the expected
utility model cannot explain the two previous preference situations simultaneously.
A possible attitude in this case is to consider that the decision-maker should
revise his judgment in order to be more rational, that is, in order to satisfy the
axioms of the model. Another interpretation is that the expected utility approach
sometimes implies unreasonable constraints on the preferences of the decision-
maker (in the previous example, the violated property is the so-called independence
axiom of Von Neumann and Morgenstern). This last interpretation led scientists
to propose many variants of the expected utility model, as in Kahneman and
Tversky (1979), Machina (1982, 1987), Bell et al. (1988), Barbera et al. (1998).
Before explaining why the expected utility model (or one of its variants) was
not applied by the analyst in the electricity production planning problem, let us
mention why using probabilities may cause some trouble in modelling uncertainties
or risk. The following example illustrates the so-called Ellsberg paradox and is
extracted from Fishburn (1970, p.172). An urn contains one white ball (W) and
two other balls. You only know that the two other balls are either both red (R),
or both green (G), or one is red and one is green. Consider the two situations in
Table 8.3 where W, R, and G represent the three states according to whether one
ball drawn at random is white, red or green. The figures are what you will be paid
(in Euros) after you make your choice and a ball is drawn.
Intuition leads many people to prefer A to B and D to C, while the expected
utility approach leads to indifference between A and B and as well as between C
and D.
This type of situation shows that the use of the probability concept may be
debatable for representing attitude towards risk or uncertainty; other tools (pos-
sibility theory, belief functions or fuzzy integrals) can also be envisaged.
8.4. A DIDACTIC EXAMPLE 193
Events Probab. C D
U 1/4 7 4.5
V 3/4 4.5 5.5
Table 8.4
where x is the difference in the evaluations of two decisions. Other functions can
be defined similarly to what is done in the Promethee method. This function
expresses the fact that a difference which is smaller or equal to 1 is considered
to be non significant. As we see, an advantage of this approach is to enable the
introduction of indifference thresholds.
The analyst proposed the following index to measure the preference of C over
D, on the basis of the data contained in Table 8.4:
C D
C 0 1/4
D 0 0
Table 8.5
Events Probab. C D
Y 3/4 4.5 1
Z 1/4 4.5 5
Table 8.6
These preference indices are summarised in Table 8.5. The score of each deci-
sion is then the sum of the preferences of this decision over the other minus the
sum of the preferences of the other over it. In the case of Table 8.5, this trivially
gives 1/4 and 1/4 as respective scores for C and D. The maximum score deter-
mines the chosen decision. So, the chosen decision at node N2 is C. Remark that,
despite the analysts doubt about the real nature of the probabilities, he used
them to calculate a sort of expected index of preference for each decision over each
other decision. This is certainly a weak point of the method and other tools, which
will be described in a volume in preparation, could have been used here. Note also
that, in the multiple criteria case, a (possibility weighted) sum is computed for all
the criteria in order to obtain the global score of a decision.
At node N3, we have to consider Table 8.6, leading to the preference indices
presented in Table 8.7. For example, the preference index of C over D is
The scores of C and D are respectively 3/4 and 3/4, so that the chosen
decision at node N3 is also C.
At node N4, decision E dominates F and G and is thus chosen (where domi-
nates means is better in each scenario).
At node N5, we must consider Table 8.8.
The preference index of G over E (for example) is
C D
C 0 3/4
D 0 0
Table 8.7
8.4. A DIDACTIC EXAMPLE 195
Probab. E F G
Y 3/4 6 2 5
Z 1/4 1 2 5
Table 8.8
E F G
E 0 3/4 0
F 0 0 0
G 1/4 1 0
Table 8.9
The other preference indices are presented in Table 8.9; they yield 1/2, 7/4
and 5/4 as respective scores for E, F and G, so that G is the chosen decision at
node N5.
We can now consider Table 8.10 associated to N1. The values in this table are
those that correspond to the chosen decisions at the nodes N2 to N5 (they are
indicated in parentheses).
On basis of this table, the preference of A over B is
Scenarios Probab. A B
S-U 1/8 7(C) 3.5(E)
S-V 3/8 4.5(C) 5.5(E)
T-Y 3/8 4.5(C) 5(G)
T-Z 1/8 4.5(C) 5(G)
Table 8.10
196 CHAPTER 8. DEALING WITH UNCERTAINTY
8.4, where 9 has been replaced by 10 in the evaluation of B for event U. If the
probabilities of S,T and U are equal to 1/3,
the expected utility approach gives
the same value 1/3 u(10) + u(15) + u(20) to A and B that are thus considered
as indifferent. However, if we compare A and B separately for each event, we see
that B is better than A for events S and T, with a probability equal to 2/3. The
approach described in this section will give a preference index of A over B equal
to
With the same function f as before, this will lead to the choice of B. Making
the (natural) assumption that f (x) = 0 when x is negative, we see that this
approach will lead to indifference between A and B only with a function f such
that f (20 10) = f (15 10) + f (20 15).
N7 10
S
T N8 15
U
E N9 20
N4 N10 15
S
C
T N11 20
F
U
N2 N12 0
A S N13 20
D
T
N5 N14 0
U
N1 N15 5
N16 0
S
B
T N17 5
N3 N6 U
N18 10
Events Probab. C D
S 1/3 15 20
T 1/3 20 0
U 1/3 0 5
Table 8.11
198 CHAPTER 8. DEALING WITH UNCERTAINTY
Events Probab. A B
S 1/3 20 0
T 1/3 0 5
U 1/3 5 10
Table 8.12
Table 8.13
Table 8.14
At each decision node, the local decisions are also compared to the best actions in
the same scenarios in each of the branches of the tree.
In Figure 8.2, at node N2, C and D are also compared to the best decision in
N4, i.e. to E (after event S).
This leads to the consideration of Table 8.14
Using the same preference function as before, the preference of C over D is
still 1/4 (see section 4.4), the preference of D over C is still 0, the preference of
C over E is [1/4 f (3.5) + 3/4 f (1)] = 1/4, the preference of E over C is
[1/4 f (3.5) + 3/4 f (1)] = 0, the preference of D over E is [1/4 f (1) + 3/4
f (0)] = 0 and the preference of E over D is [1/4 f (1) + 3/4 f (0)] = 0.
Table 8.15 summarises these values.
C D E
C 0 1/4 1/4
D 0 0 0
E 0 0 0
Table 8.15
The scores for C and D are respectively 1/2 and 1/4, C is therefore chosen
at node N2.
At node N3, we compare C and D with the best decision in N5, i.e. with G
(after event T), on basis of Table 8.16.
Table 8.17 gives the preference indices.
The scores of C and D are respectively 3/4 and 3/2, so that C is also chosen
in N3.
The analysis of N4 (comparison of E, F, G and C (N2)) and of N5 (comparison
of E, F, G and C (N3)) lead to the same conclusions as in the first step, so that,
in this example, the second step does not change anything.
However, the interest of this second step is to choose, at each decision node,
a decision leading to a final result that is strong, not only locally, but also in
Events Probab. C D G
Y 3/4 4.5 1 5
Z 1/4 4.5 5 5
Table 8.16
200 CHAPTER 8. DEALING WITH UNCERTAINTY
C D G
C 0 3/4 0
D 0 0 0
G 0 3/4 0
Table 8.17
Prob. E F D B
1/3 10 15 20 0
1/3 15 20 0 5
1/3 20 0 5 10
Table 8.18
comparison with the strongest results obtained during the first step in the other
branches of the tree (always in the same scenarios). This is illustrated by the
example in Figure 8.6 where the second step works as follows. At node N4, we
compare E and F with D and B (the best actions in the other branches as they
are unique), through Table 8.18
Table 8.19 presents the preference indices.
The scores of E and F respectively become 1 and 1/3, so that the best decision
at N4 is now E.
At N2, we have to compare C (followed by E) with D and B (best action in
the other branch): the scores of C and D are respectively 4/3 and -2/3, so that
the best decision in N2 is now C.
At N1, we have to compare A (followed by C and E) with B and we choose
A (that dominates B). So we see that this second step somehow avoids to choose
dominated actions, although this property is not guaranteed in all cases.
8.5 Conclusions
This approach (first and second steps) was successfully implemented and applied
by the company (after many difficulties due to the combinatorial aspects of the
problem) and some visual tools were developed in order to facilitate the decision-
E F D B
E 0 1/3 2/3 1
F 2/3 0 1/3 2/3
D 1/3 2/3 0 1/3
B 0 1/3 2/3 0
Table 8.19
8.5. CONCLUSIONS 201
However, this approach also presents some mysterious aspects that should be
more thoroughly investigated:
it computes a sort of expected index for preference of each action over each
other action, although the role of the so-called probabilities is not that clear
in the modelling of uncertainty;
it is a rather bizarre mixture of local (first step) and global (second step)
comparisons of the actions, but it does not guarantee that the chosen action
is non-dominated.
50
0.5
0.5
N1 0
B 10
0.2 1
0
0.8
namic choice problem is characterised by the fact that at least one uncertainty
node is followed by a decision node (this is typically the case of the application de-
scribed in this chapter). In such a context, an interesting property is the so-called
dynamic consistency: a decision-maker is said to be dynamically inconsistent
if his actual choice when arriving at a decision node differs from his previously
planned choice for that node.
Let us illustrate this concept by a short example. Assume that a decision-
maker prefers a game where he wins 50 e with probability 0.1 (and nothing with
probability 0.9) to a game where he wins 10 e with probability 0.2 (and nothing
with probability 0.8). At the same time, he prefers to receive 10 e with certainty
to a game where he wins 50 e with probability 0.5 (and nothing with probability
0.5). Note that these preferences violate the independence axiom of Von Neumann
and Morgenstern. Now consider the tree of Figure 8.7.
According to the previous information, the actual choice of the decision-maker,
at node N1, will be B. However, if he has to plan the choice between A and B
before knowing the first choice of nature, he can easily calculate that if he chooses
A, he wins 50 e with probability 0.1 (and nothing with probability 0.9), while if he
chooses B, he wins 10 e with probability 0.2 (and nothing with probability 0.8),
so that the best choice for him (before knowing the first choice of nature) is A.
So, the actual choice at N1 differs from the planned choice for that node,
illustrating the so-called dynamic inconsistency. It can be shown that any depar-
ture from the traditional approach can lead to dynamic inconsistency. However,
Machina (1989) showed that this argument relies on a hidden assumption con-
cerning behaviour in dynamic choice situations (the so-called consequentialism)
and argued that this assumption is inappropriate when the decision-maker is a
non-expected utility maximiser.
This example shows that no approach can be considered as ideal in the context
of decision under uncertainty. As for the other situations studied in this book,
each model, each procedure, can present some pitfalls that have to be known by
the analyst. Knowing the underlying assumptions of the decision-aid model which
8.5. CONCLUSIONS 203
will be used is probably the only way, for the analyst, to guarantee an as scientific
as possible approach of the decision problem. It is a fact that, due to lack of
time and other priorities, many decision tools are developed in real applications
without taking enough precautions (this is also the case in the example presented
in this chapter, due to the short delays and to the necessity of overcoming the
combinatorial aspects of the problem). This is why we consider providing some
guidelines for modelling a decision problem important to the analysts: this will be
the subject of a volume in preparation.
9
SUPPORTING DECISIONS: A
REAL-WORLD CASE STUDY
Introduction
In this chapter1 we report on a real world decision aiding process which took place
in a large Italian firm, in late 1996 and early 1997, concerning the evaluation
of offers following a call for tenders for a very important software acquisition.
We will try to extensively present the decision process for which the decision
support was requested, the actors involved, the decision aiding process, including
the problem structuring and formulation, the evaluation model created and the
multiple criteria method adopted. The reader should be aware of the fact that
very few real world cases of decision support are reported in literature although
much more occur in reality (for noteworthy exceptions see Belton, Ackermann and
Shepherd 1997, Bana e Costa, Ensslin, Correa and Vansnick 1999, Vincke 1992,
Roy and Bouyssou 1993).
We introduce such a real case description for two reasons.
1. The first reason consists in our will to give an account of what providing de-
cision support in a real context means and to show the importance of elements
such as the participating actors, the problem formulation, the construction of the
criteria etc., often neglected in many conventional decision aiding methodologies
and in operational research. From this point of view the reader may find questions
already introduced in previous chapters of the book, but here they are discussed
from a decision aiding process perspective.
2. The second reason is our will to introduce the reader to some concepts and
problems that will be extensively discussed in a forthcoming volume by the au-
thors. Our objective is to stimulate the reader to reflect on how decision support
tools and concepts are used in real life situations and how theoretical research
may contribute to aide real decision makers in real decision situations. More
precisely, the chapter is organised as follows. Section 1 introduces and defines
some preliminary concepts that will be used in the rest of the chapter such as
decision process, actors, decision aiding process, problem formulation, evaluation
model etc.. Section 2 presents the decision process for which the decision sup-
port was requested, the actors involved and their concerns (stakes), the resources
1A large part of this chapter uses material already published in Paschetta and Tsoukias (1999).
205
206 CHAPTER 9. SUPPORTING DECISIONS
involved and the timing. Section 3 describes the decision aiding process, mainly
through the different products of such a process that are specifically analysed
(the problem formulation, the evaluation model and the final recommendation)
and discusses the experience conducted. The clients comments on the experience
are also included in this section. Section 4 summarises the lessons learned in such
an experience. All technical details are included in Appendix A (an ELECTRE-
TRI type procedure is used), while the complete list of the evaluation attributes
is provided in Appendix B.
9.1 Preliminaries
We will make extensive use of some terms (like actor, decision process etc.) in this
chapter that, although present in literature (see Simon 1957, Mintzberg, Rais-
inghani and Theoret 1976, Jacquet-Lagreze, Moscarola, Roy and Hirsch 1978,
Checkland 1981, Heurgon 1982, Masser 1983, Humphreys, Svenson and Vari 1993,
Moscarola 1984, Nutt 1984, Rosenhead 1989, Ostanello 1990, Ostanello 1997, Os-
tanello and Tsoukias 1993), can have different interpretations. In order to help
the reader understand how such terms are used in this presentation we introduce
some informal definitions.
Decision Aiding Process: part of the decision process and more precisely the
interactions occurring at least between the client and the analyst.
the market offered a very large variety of software which could be used as a
GIS for the companys purposes;
the company required a very particular version of GIS that did not exist as
a ready made product on the market, but had to be created by customising
and combining different modules of existing software, with the addition of
ad-hoc written software for the purpose of the company;
the question asked by the ISD was very general, but also very committing,
because it included an evaluation prior to an acquisition and not just a simple
description of the different products;
the GISD felt able to describe and evaluate different GIS products based on
a set of attributes (at the end several hundreds), but was not able to provide
a synthetic evaluation, the purpose of which was just as obscure (the use
of a weighted sum was immediately set aside because it was perceived as
meaningless).
At this point of the process the GISD found out that a unit concerned with
the use of the MCDA (Multiple Criteria Decision Analysis) methodology in soft-
ware evaluation (MCDA/SE) was operating within the RDA and presented this
problem as a case study opening a specific commitment. The MCDA/SE unit
responsible then decided to activate its links with an academic institution in order
to get more insight and advice on the problem that soon appeared to overcome the
knowledge level of the unit at that time. At this point we can make the following
remarks.
The decision process for which the decision aid was provided concerned the
acquisition of a GIS for X (the company). The actors involved at this level
are the companys IS manager, acquisition (AQ) manager, the RDA, differ-
ent suppliers of GIS software, some of the companys external consultants
concerned with software engineering.
A first decision aiding process was established where the client was the IS
manager and the analyst was the GIS department of the RDA.
208 CHAPTER 9. SUPPORTING DECISIONS
A second decision aiding process was established where the client was the
GIS department of the RDA and the analyst was the MCDA/SE unit. A
third actor involved in this process was the supervisor of the analyst in
the sense of someone supporting the analyst in different tasks, providing him
with expert methodological knowledge and framing his activity.
We will focus our attention on this second decision aiding process where four
actors are involved: the IS manager, the GISD (or team of analysts) as the client
(bear in mind their particular position of clients and analysts at the same time),
the MCDA/SE unit as the analyst and the supervisor.
The first advice by the analyst to the GISD was to negotiate a more specific
commitment such that their task could be more precise and better defined with
their client. After such a negotiation the GISDs activity has been defined as
technical assistance to the IS manager in a bid, concerning the acquisition of a
GIS for the company and its specific task was to provide a technical evaluation
of the offers that were expected to be submitted. For this purpose the GISD drafted
a decision aiding process outline where the principal activities to be performed were
specified, as well as the timing, and submitted this draft to its client (see figure
9.1). At this point it is important to note the following.
1. The call for tenders concerned the acquisition of hundreds of software li-
censes, plus the hardware platforms on which such software was expected to
run, the whole budget being several million e. From a financial point of view
it represented a large stake for the company and a high level of responsibility
for the decisionmakers.
2. From a procedural point of view the administration of a bid of this type is
delegated to a committee which in this case included the IS manager, the
AQ manager, a delegate of the CEO and a lawyer from the legal staff. From
such a perspective the task of the GISD (and of the decision aiding process)
was to provide the IS manager with a global technical evaluation of the
offers that could be used in the negotiations with the AQ manager (inside
of the committee) and the suppliers (outside of the committee).
3. As already noted before, the bid concerned software that was not ready made,
but a collection of existing modules of GIS software which was expected to
be used in order to create ad-hoc software for the specific necessities of the
company. Two difficulties arose from this:
Bid Start
Completion of
Tender
decision model Definition of
preparation
for second prototype
requirements Lab preparation
selection
for prototype
Second set of evaluation
answers from
suppliers Second selection
Prototypes
Prototype analysis;
sorting & final ranking
Final Choice
Once the call for tenders had been prepared (including the software require-
ments sections, the tenderers requirements section, the timing and evaluation pro-
cedure), a set of was presented to the company and the technical evaluation activity
was settled. It is interesting to notice that the GISD staff charged with this evalu-
ation has been supported by external consultants, software engineering experts
in the companys sector who practically acted as the IS managers delegates in the
group. It is this extended group that signed the final recommendation presented
to the IS manager and that we will hereafter call team of analysts (for the IS
manager) or client (for the MCDA/SE unit and for us).
A second step in the decision aiding process was the generation of a problem
formulation and of an evaluation model. Although we formally consider the two
as two distinct products of the process, in reality and in this case specifically, they
have been generated contemporaneously. We will discuss the problem formulation
and the evaluation model in detail in the next section, but we can anticipate that
the final formulation consisted in an absolute evaluation of the offers under a set
of points of view that could be divided into two parts: the quality evaluation
and the performance evaluation. Although the set of alternatives was relatively
small (only six alternatives were considered), the set of attributes was extremely
complex (as often happens in software evaluation). Actually there were seven basic
evaluation dimensions, expanded in an hierarchy with 134 leaves resulting in 183
evaluation nodes (see Appendix B).
A third and final step in the decision aiding process was the elaboration of the
final recommendation after all the necessary information for the evaluation had
been obtained and the evaluation performed. We will discuss such constructions
in detail in the next sections, but we can anticipate that such an elaboration
highlighted some questions (substantial and methodological) that have not been
considered before.
Some months after the end of the process and the delivery of the final report
we asked our client (the team of analysts) to discuss their experience with us and
to answer some questions concerning the methodology used, how they perceived it,
what they learned and what their appreciation was. The discussion was conducted
in a very informal way, but the client provided us with some written remarks that
were also reported during a conference presentation (see Fiammengo, Buosi, Iob,
Maffioli, Panarotto and Turino 1997). Such remarks are introduced in the following
section.
1. It was extremely important for the client (the team of analysts) to under-
stand his role in the process, what his client expected and what they were
able to provide. In fact, at the beginning of the process, the problem situ-
ation was absolutely unclear. Moreover, the client considered to be able to
understand that the expectations of the other actors involved in the process
were extremely relevant both for strategic reasons (having to do with or-
ganisational problems of the company) and operational reasons (recommend
something reliable in a clear and sound way for all the actors involved in the
bid).
Reporting the clients remarks: ....MCDA (Multi Criteria Decision Analy-
sis) was very useful in organising the overall process and structure of the bid:
what were the important steps to do, how to define the call for tenders,....,
....MCDA was used as a background for the whole decision process. With
such a perspective it turned out to be very useful because every activity had a
justification...., ....as a formal process MCDA guaranteed greater control
and transparency to the process...., A complex process, such as a bid, could
be greatly eased by the use of any process centred methodology.
It is this last sentence which clearly highlights the necessity for the client to
have a support along the whole process and for all its aspects, which could
be able to take what was happening in the decision process into account. We
actually agree with their comment that any process modelled methodology
could be useful and we consider that their positive perception of MCDA is
based on the fact that it was the first decision support approach process they
came to know.
consequences of their acts. The use of a formal approach enables the reduc-
tion of ambiguity (without completely eliminating it) and thus appears to
be an important support to the decision process.
It is clear that defining a precise problem formulation became a key issue for
the client because it clarified his role in the decision process (the bid management),
his relation with the IS manager (his client) and gave him a precise activity to
perform.
We define (Morisio and Tsoukias 1997) a problem formulation as the collection
of: a set of actions, a set of points of view and a problem statement. The only point
that caused a discussion in the analysts team concerning the problem formulation
was the problem statement. The set of alternatives was considered to be the set of
offers submitted after the call for tenders. A first idea to evaluate the tenderers,
as well as the offers, was eliminated due to the particular technology where no
consolidated producers exist. The set of points of view was defined using the
team of analysts technical knowledge and can be viewed in two basic sets. One
concerning quality including specific technical features required for the software
plus some ISO/IEC 9126 (1991) based dimensions and the second concerning the
performance of the offered software to be tested on prototypes. Such points of
view formed a huge hierarchy (see further on for details). No cost estimates were
required by the client and so they were not considered in this set.
After some discussion the problem statement adopted was the one of an ab-
solute evaluation of the offers both on a disaggregated level and on a global one.
Actually, the team of analysts interpreted the clients demand as a question of
whether the offers could be considered as intrinsically good, bad etc. and not
to compare bids amongst themselves. There were two reasons for this choice.
1. A simple ranking of the offers could conceal the fact that all of them could
be of very poor quality or satisfy the software requirements to a very low
level. In other words it could happen that the best bid could be bad and
this was incompatible with the importance and cost of the acquisition.
2. The team of analysts felt uncomfortable with the idea of comparing the
merits (or de-merits) of an offer with merits (or de-merits) of another offer.
A first informal discussion of the problem of compensation convinced them
to overcome this question by comparing the offers to profiles about which
they had sufficient knowledge.
the formulation itself could no longer be valid at the end of the process. This was
partly due to the very rapid evolution of GIS technology that could completely
innovate the state of the art in six months. Another observation made by part of
the team of analysts was that towards the end of the process, due to the knowledge
acquired in this period (mainly due to the process itself), they could revise some
of their judgements. Actually, the length of the evaluation was considered as a
negative critical issue in the clients remarks.
The final report did not consider any revision of the formulation and the eval-
uations since in the context of a call for tenders, it could be considered unfair to
modify the evaluations just before the final recommendation.
We consider that this is a critical issue for decision support and decision aiding
processes. Information is valid only for a limited period of time and consequently
the same is true for all evaluations based on such information. Moreover the
client himself may revise the problem formulation or update his perception of the
information and modify his judgements. This is rarely considered in decision aiding
methodologies. While for relatively short decision aiding processes the problem
may be irrelevant, it is certain that in long processes such a problem cannot be
neglected and requires specific consideration.
1. For all leave nodes an ordinal scale was established. The available technical
knowledge consisted in different possible states in which an offer could find
216 CHAPTER 9. SUPPORTING DECISIONS
itself. For instance, consider the leave nodes 1.1.1 (type of presentation on
the user interface in the land-base management), 1.1.2 (graphic engine of
the user interface in the land-base management), 1.1.3 (customisation of the
user interface in the land-base management). The possible states on these
characteristics were:
1.1.1: standard graphics (SG), non standard graphics (NSG);
1.1.2: station M (M; graphic engine already adopted in other software used
in the company), other acceptable graphic engine (OA), other non accept-
able graphic engine (ON);
1.1.3: availability of a graphic tool (T), availability of an advanced graphic
language (E), availability of a standard programming language (S), no cus-
tomisation available (N). In this case different possible combinations were
possible (for instance a software could provide both an advanced graphic
language and a standard programming language: value E,S). The three or-
dinal scales associated to the three nodes were ( representing the scale
order):
1.1.1: SG NSG;
1.1.2: M OA ON;
1.1.3: T,E,S T,E T,S T E,S E S N.
2. For all parent nodes, a brief descriptive text of what the node was expected
to evaluate was provided. All parent nodes were equipped with the same
number of classes: unacceptable (U), acceptable (A), good (G), very good
(VI), excellent (E). Then, two possibilities for defining the relationship be-
tween the values on the subnodes and the values on the parent nodes were
established.
have been established using a reasoning on coalitions (for details see Chapter
6). In other words the team of analysts established the characteristics of the
subnodes for which an offer could be considered very good (therefore should
outrank the very good profile) and consequently compared the values of the
parameters of relative importance and of the concordance threshold. The
veto condition was established as the presence of the value unacceptable
at a subnode. The presence of a veto also produced an unacceptable
value at the level of the parent node. In other words, the team of analysts
considered any unacceptable value to be a severe technical limitation of the
offer. The reader may notice that this is a very strong interpretation of a veto
condition among the ones used in the outranking based sorting procedures,
but it was the one with which the team of analysts felt comfortable at the
time of construction of the evaluation model. The team of analysts also
established very high concordance thresholds (never less than 80%, very
often around 90%) that result in very severe evaluations. Such a choice
reflected the conviction, of at least a part of the team of analysts, that very
strong reasons were required to qualify an offer as very good. Since the whole
model was calibrated starting from the very good value, this conviction had
wider effects than the team of analysts could imagine. For example we can
take node 1 (land-base management) which has eight sub nodes:
1.1: User interface;
1.2: Functionality;
1.3: Development environment;
1.4: Administration tools;
1.5: Work flow connection;
1.6: Interoperability;
1.7: Integration between land-base products and the Spatial Data manager;
1.8: Integration among land-base products;
The relative importance parameters were established as follows:w(1.1) =
4, w(1.2) = 8, w(1.3) = 5, w(1.4) = 4, w(1.5) = 1, w(1.6) = 8, w(1.7) =
8, w(1.8) = 2 and the concordance threshold was fixed as 29/36 (around
0.8). Such choices imply that no coalition that excluded nodes 1.2 or 1.7
was acceptable and that the smallest acceptable coalition should necessarily
include the nodes 1.2, 1.7, 1.3 and any two of the nodes 1.1, 1.4 and 1.6.
The analyst and the supervisor explained this aspect to the client who on
this basis, revised the importance parameters several times.
3. As already mentioned, the set of dimensions was built around two basic
points of view: the quality and the performances. The first generated
six evaluation dimensions, which will be called the quality attributes or
quality criteria or quality part of the hierarchy hereafter, corresponding
to six (among seven) of the root nodes of the model. The seventh root
node (node 7, subnodes 7.1, 7.2, 7.3, 7.4) concerned the evaluation of the
performances of the prototypes submitted to tests by the team of analysts.
Such performances are basically measured in the time necessary to execute
a set of specific tasks under certain conditions and with some external fixed
218 CHAPTER 9. SUPPORTING DECISIONS
parameters. For instance, consider node 7.3 (performance under load). The
dimension is expected to evaluate the performance of the prototype while
the quantity of data that have to be elaborated increases. The value v(x)
(x being an offer) combines an observed measure Wx (t) and an interpolated
one Tx (t) (t representing the data load; the interpolation is not necessarily
linear). The combination is obtained, in this case, through the following
formula:
Z
v(x) = Wx (t)Tx (t)dt
In this case there are no external profiles with which to compare the perfor-
mances because the prototypes are created ad-hoc, the technology is quite
new and there are no standards of what a very good performance could be.
An ordinal scale was created considering the best performances as first,
all performances presenting a difference of more than 5% and less than 20%
second, all performances presenting a difference of more than 20% and less
than 25% third, all performances presenting a difference of more than 25%
and less than 50% fourth and all performances presenting a difference of
more than 50% fifth. The same model was applied to all subnodes of
node 7. A sorting procedure could then be established to obtain the final
evaluation.
This process was repeated for all the intermediate nodes up to the seven root
nodes representing the seven basic evaluation dimensions. It took four to five
months for all the nodes to be equipped with their evaluation model and the
process generated several discussions inside the team of analysts, mainly of a
technical nature (concerning the specific contents of the values for each node).
The most discussed concept of the model was the concordance threshold and the
veto condition since part of the team considered that the required levels were
extremely severe. However, since such an approach corresponded to a cautious
attitude, it prevailed in the team and finally was accepted. The length of the
process is justified, not only by the quantity of nodes to define, but also because
the team of analysts was obliged to define a new measurement scale and a precise
measurement aggregation procedure for each node. Although this process can
be often qualified as subjective measurement, it was the only way to obtain
meaningful values for the offers. The set of criteria to be used, if a preference
aggregation comparing the alternatives amongst themselves was requested, was
defined as the seven root nodes equipped with a simple preference model: the
weak order induced by the ordinal scale associated to each of these nodes.
No exogenous uncertainty was considered in the evaluation model. The in-
formation provided by the tenderers concerning their offers was considered to be
reliable and the use of ordinal scales made it possible to avoid the problems of im-
precision or of measurement errors. This reasoning however, is less true for node 7
and its subnodes, but the team of analysts felt sufficiently confident with the tests
and did not analyse the problem further. Some endogenous uncertainty appeared
as soon as the model was put into practice (the offers being available). We shall
9.3. DECISION SUPPORT 219
discuss this problem in more detail in the next section (concerning the elaboration
of the final recommendation), but we can anticipate that the problem was created
by the double evaluation provided by the chosen ELECTRE-TRI type aggrega-
tion consisting in an optimistic and a pessimistic evaluation which may not
necessarily coincide.
The evaluation model was coded in a formal document that was submitted
(and explained) to the final client receiving his consensus. It is worthwhile to
note that the final client was not able to participate in the elaboration of the
model (technical details, establishment of the parameters etc.). Part of the team
of analysts (some of the external consultants) were acting as his delegates. The
establishment of the evaluation model and its acceptance by the client opened the
way for its application on the set of offers received and for the elaboration of the
final recommendation.
The client greatly appreciated his involvement in the establishment of the eval-
uation model that turned out to be a product considered to be their own (from
their ex-post remarks: ....this (the involvement) turned out to be important....for
the acceptability of the evaluation results). The fact that each node of the hier-
archy was discussed, analysed and finally defined by the team of analysts allowed
them to understand the consequences for the global level, to be able to explain the
contents of the model to their client and justify the final result on the grounds of
their own knowledge and experience, not of the procedure adopted.
In other words we can claim that the model was validated during its construc-
tion. Such an approach helped both the acceptability of the model and the final
result, eased the discussion when the question of the final aggregation was settled
and definitely legitimated the model in the eyes of the client.
1. The key parameters used in the method are the profiles (to which the al-
ternatives are compared in order to be classified in a specific class), the
importance of each criterion for each parent criterion classification and the
concepts of concordance thresholds and veto conditions.
For each intermediate node such parameters were extensively discussed be-
fore reaching a precise numerical representation. As already mentioned in
section 3.2 the relative importance of each criterion and the concordance
threshold were established using a reasoning based on the identification of
the winning coalitions enabling the outranking relation to hold. The veto
220 CHAPTER 9. SUPPORTING DECISIONS
2. The whole method (and the model) was implemented on a spreadsheet. This
was of great importance because spreadsheets are a basic tool for communi-
cation and work in all companies and enable an immediate understanding of
the results. Moreover, they enable on-line what-if operations when specific
problems, concerning precise information and/or evaluation, appeared dur-
ing the discussions inside the team of analysts. The experimental validation
of the model was greatly eased by the use of the spreadsheet.
Further on it helped the acceptability and legitimation of the model through
the idea that if it can be implemented on a spreadsheet it is sufficiently
simple and easy to be used by our company. In fact some of the critiques by
the client about the approach adopted in this case were that ....MCDA is
not yet a universally known method...., ....seems less intuitive than other
well known techniques such as the weighted sum..., ....it is time consuming
to apply a new methodology...., all these problems limiting the acceptability
of the methodology towards the clients client (the IS manager) and the
company more generally. Being able to implement the method and the model
on a spreadsheet was, for them, a proof that, although new, complex and
apparently less intuitive, the method was simple and easy and therefore
legitimately used in the decision process.
A specific problem which was raised in the first step was the generation of un-
certainty due to the aggregation procedure. The ELECTRE-TRI type procedure
adopted produces an interval evaluation consisting in a lower value (the pessimistic
evaluation) and an upper value (the optimistic evaluation). When an alternative
has a profile on the subnodes that is very different from the profiles of the classes
on the parent node then, due to the incomparabilities that occur when comparing
9.3. DECISION SUPPORT 221
O1 O2 O3 O4 O5 O6
C1 A-A G-G A-VG A-G G-VG A-A
C2 A-A G-VG A-VG A-VG G-G A-G
C3 A-A G-G A-VG G-G A-A A-A
C4 A-G G-VG A-VG G-VG A-VG A-G
C5 U-U G-VG G-G A-G G-VG U-U
C6 A-A VG-VG E-E VG-VG G-G VG-VG
Table 9.1: the values of the alternatives on the six quality criteria (U: unacceptable,
A: acceptable, G: good, VG: very good, E: excellent)
the alternative to the profiles, it may happen that the two values do not coin-
cide (see more details in Appendix A). When the user of the model is not able to
choose one of the two evaluations in an hierarchical aggregation can be a problem
since at the next aggregation the subnodes may have evaluations expressed on
an interval. This is a typical case of endogenous uncertainty created by a method
itself and not by the available information. The client was keen to consider the
pessimistic and optimistic evaluation as bounds of the real value, but there was
no uncertainty distribution on the interval. For this purpose, the following pro-
cedure was adopted. Two distinct aggregations were made, one where the lower
values were used and the other where the upper values were used. Each of these,
in turn, may produce a lower value and an upper value. At the next aggregation
step, the lowest of the two lower values and the highest of the two upper values is
used. This is a cautious attitude and has the drawback of widening the intervals
as the aggregation goes up the hierarchy. However, this effect did not occur here
and the final result for the six dimensions is represented in table 9.1 (from here
on we will represent the criteria by Ci and the alternatives by Oi).
The results on node 7 concerning the performances of the prototypes are pre-
222 CHAPTER 9. SUPPORTING DECISIONS
O1 O2 O3 O4 O5 O6
C7 A-A G-G G-G A-A E-E A-A
Table 9.2: the values of the alternatives on the performance criterion (U: unac-
ceptable, A: acceptable, G: good, VG: very good, E: excellent)
sented in table 9.2. Remember that such a result is an ordinal scale obtained by
aggregating the four scales defined as explained in the previous section. Therefore,
it could be considered more as a ranking than as an absolute evaluation. For this
reason the team of analysts decided to use such an attribute only to rank the
different offers after their sorting obtained by using the six quality attributes. For
this purpose the team of analysts tested three different aggregation scenarios cor-
responding to three different hypotheses about the importance of the performance
attribute.
O2 O5
@
@
@
R
O2 O3
?
O3,O4,O5 O4
?
? ?
O6 O6
? ?
O1 O1
2a 2b
Figure 9.2: 2a: the final ranking using the six quality criteria. 2b: the final ranking
as intersection of the six quality criteria and the performance criterion
O1 O2 O3 O4 O5 O6
O1 1 0 0 0 0 0
O2 1 1 1 1 1 1
O3 1 0 1 0 0 1
O4 1 0 0 1 0 1
O5 1 0 0 0 1 1
O6 1 0 0 0 0 1
Table 9.3: the outranking relation aggregating the six quality criteria
O1 O2 O3 O4 O5 O6
O1 1 0 0 0 0 0
O2 1 1 1 1 0 1
O3 1 0 1 0 0 1
O4 1 0 0 1 0 1
O5 1 0 0 0 1 1
O6 1 0 0 0 0 1
view the absolute evaluations on of the six quality attributes were trans-
formed into rankings as in the first scenario adding the seventh attribute as
a seventh criterion. The seven weak orders are the following:
- O5 O2 O3 O4 O1, O6;
- O2 O5 O3 O4 O6 O1;
- O2 O4 O3 O5, O1, O6;
- O2, O4 O3, O5 O1, O6;
- O2, O5 O3, O4 O1, O6;
- O3 O2 O6, O4 O5 O1.
- O5 O2, O3 O4, O6, O1.
The importance parameters are w(1.) = 2, w(2.) = 2, w(3.) = 4, w(4.) =
1, w(5.) = 4, w(6.) = 2, w(7.) = 4 and the concordance threshold 16/19
(more than 0.8). The final result is reported in table 9.4.
Finally and after some discussions with the client, the third scenario was
adopted and used as the final result. The two basic reasons were:
- while it was meaningful to interpret the ordinal measures for the six quality at-
tributes as weak orders representing the clients preferences, it was not meaningful
to translate the weak order obtained for the performance attribute as an ordinal
measurement of the offers;
9.3. DECISION SUPPORT 225
- the first and second scenarios implicitly adopted two extreme positions concern-
ing the importance of the performance attribute that correspond to two different
philosophies present in the team of analysts, but not to the clients perception of
the problem. The importance parameters and the concordance threshold adopted
in the final version made it possible to define a compromise of these two extreme
positions expressed during the decision aiding process.
In fact the performance criterion is associated with an importance parameter
of 4 which combined with the concordance threshold of 16/19 implies that it is
impossible for an alternative to outrank another if its value on the performance
criterion is worse (and this satisfied the part of the team of analysts that considered
the performance criterion as a critical evaluation of the offers). Giving a regular
importance parameter to the performance criterion avoided the extreme situation
in which all other evaluations could become irrelevant. The final ranking obtained
respects this idea and the outranking table could be understood by all the members
of the team of analysts. As already reported, the client considered the approach
to be useful because every activity was justified. A major concern for people
involved in complex decision processes is to be able to justify their behaviour,
recommendations and decisions towards a director, a superior in the hierarchy of
the company, an inspector, a committee etc.. Such a justification applies both to
how a specific result was obtained and to how the whole evaluation was conducted.
In this case, for instance, the choice of the final aggregation was justified by
a specific attitude towards the two basic evaluation points of view: the quality
information and the performance of the prototypes. It was extremely important
for the client to be able to summarise the correspondence between an aggregation
procedure and an operational attitude because it enabled them to better argue
against the possible objections of their client.
A final question that arose during the elaboration of the final recommendation
was elaborated was whether it would be possible to provide a numerical represen-
tation of the values obtained by the offers and of the final ranking. It was soon
clear that the question originated from the will of the final client to be able to
negotiate with the AQ manager on a monetary basis since it was expected that he
would introduce the cost dimension into the final decision.
For this purpose an appendix was included in the final recommendation where
the following was emphasised:
- it is possible to give a numerical representation to both the ordinal measurement
obtained using the six quality attributes and to the final ranking obtained using
the seven criteria, but is was meaningless to use such a numerical representation
in order to establish implicit or explicit trade-offs with a cost criterion;
- it is possible to compare the result with a cost criterion following two possible
approaches:
1.) either induce an ordinal scale from the cost criterion and then, using an
ordinal aggregation procedure construct a final choice (then the negotiation should
concentrate on defining the importance parameters, the thresholds etc.);
2.) or establish a value function of the client using one of the usual protocols
available in literature (see also in Chapter 6) to obtain the trade-offs between the
226 CHAPTER 9. SUPPORTING DECISIONS
quality evaluations, the performance evaluations and the cost criterion (then the
negotiations should concentrate on a value function);
- the team of analysts was also available to conduct this part of the decision aiding
process if the client desired it.
The final client was very satisfied with the final recommendation and was also
able to understand the reply about the numerical representation. He nevertheless
decided to conduct the negotiations with the AQ manager personally and so the
team of analysts terminated its task with the delivery of the final recommendation.
A final consideration can be the fact that it is sure that there was space (but
no time) to experiment with more variants and methods for the aggregation pro-
cedure and the construction of the final recommendation. Valued relations, valued
similarity relations, interval comparisons using extended preference structures, dy-
namic assignment of alternatives to classes and other innovative techniques were
considered too new by the client who already considered the use of an approach
different from the usual grid and weighted sum a revolution (compared with the
companys standards). In their view, the fact of being able to aggregate the ordinal
information available in a correct and meaningful way was more than satisfactory
as they report in their ex-post remarks: ....pointed out that it was not necessary
to always use ratio scales and weighted sums, as we thought before, but that it was
possible to use judgements and aggregate them.....
9.4 Conclusions
Concluding this chapter we may try to summarise the lessons learned in this real
experience of decision support.
The most important lesson perhaps concerns the process dimension of decision
support. What the client needed was continuous assistance and support during
the decision process (the management of the call for tenders) enabling them to
understand their role, the expected results, and the way to provide a useful con-
tribution. If the support was limited to answering the client demand on how to
define a global evaluation (based on the weighted sum of their notes on the prod-
ucts) we may have provided them with an excellent multi-attribute value model
that would have been of no interest for their problem. This is not against multi-
attribute value based methods, which in other decision aiding processes can be
extremely useful, but an emphasis on a process based decision aiding activity.
A careful analysis of the problem situation, a consensual problem formulation, a
correct definition of the evaluation model and an understandable and legitimated
final recommendation are the products that we have to provide in a decision aiding
process.
A second lesson learned concerns the ownership of the final recommendation.
By this we want to indicate the fact that the client will be much more confident in
the result and much more ready to apply it if he feels that he owns the result in the
sense that it is a product of his own convictions, values, computations, experience,
simulations and whatever else. Such ownership can be achieved if the client not
only participates in elaborating the parameters of the evaluation model, but actu-
9.4. CONCLUSIONS 227
ally build the model with the help of the analyst (which has been the case in our
experience). Although the specific case may be considered exceptional (due to the
specific dimension of the evaluation model and the double role of the client being
analyst for another client at the same time) we claim that is always possible to
include the client in the construction of the evaluation model in a way that allows
him to feel responsible and to own the final recommendation. Such ownership
greatly eases the legitimisation of the recommendation since it is not just the ad-
vice recommended by the experts who do not understand anything. It might be
interesting to notice that a customised implementation of the model on the tools
on which the client is accustomed (as in our case the company spreadsheet) greatly
improves the acceptance and legitimisation of the evaluation model.
A third lesson concerns the key issue of meaningfulness. The construction of
the evaluation model must obey two dimensions of meaningfulness. The first is
a theoretical and conceptual one and refers to the necessity to manipulate the
information in a sound and correct way. The second is a practical one and refers
to the necessity to manipulate the information in a way understandable by the
client and corresponding to his intuitions and concerns. It is possible that such
two dimensions may conflict. However, the evaluation model has to satisfy both
requirements, thus implying a process of adaptation guided by reciprocal learning
for the client and the analyst. The existence of clear and sound theoretical re-
sults for the use of specific preference modelling tools, preference and/or measure
aggregation procedures and other modelling tools definitely helps such a process.
A fourth lesson concerns the importance of the distinction between measures
and preferences. The first refer to observations made on the set of alternatives
either through objective or through subjective measures. The seconds refer
to the clients values, is always subjective and depends on the problem situation.
Moving from one to the other might be possible, but not obvious and has to be
carefully studied. Knowing that a software has n function points, while another
has m function points does not imply any particular preference between them. We
hope that the case study offered an introduction to this problem.
A fifth lesson concerns the definition of the aggregation procedure in the evalu-
ation model. The previous chapters of this book provide enough evidence that uni-
versal methods for aggregating preferences and/or measures do not exist. There-
fore, the aggregation procedures included in an evaluation model are choices that
have to be carefully studied and justified.
A sixth lesson is about uncertainty. Even when the available information is
considered reliable, uncertainty may appear (as in our case). Moreover, uncer-
tainty can appear in a very qualitative way and not necessarily in the form of an
uncertainty distribution. It is necessary to have a large variety of uncertainty rep-
resentation tools in order to include the relevant one in the evaluation model. Last,
but not least, we emphasise the significant number of open theoretical problems
the case study highlights (interval evaluation, ordinal measurement, hesitation
modelling, hierarchical measurement, ordinal value theory etc.).
228 CHAPTER 9. SUPPORTING DECISIONS
Appendix A
The basic concepts adopted in the procedure used (based on ELECTRE TRI) are
the following.
A set A of alternatives ai , i = 1 m.
where
X X X
x A, y P : C(x, y) wj c and ( wj wj )
jG jG+ jG
y A, x P : C(x, y)
X X X X X
( wj c and wj wj ) or ( wj > wj )
jG jG+ jG jG+ jG
9.4. CONCLUSIONS 229
where
- G+ = {gj G : Pj (x, y)}
- G = {gj G : Pj (y, x)}
- G= = {gj G : Ij (x, y)}
- G = G+ G=
- c: the concordance threshold c [0.5, 1]
- d: the discordance threshold d [0, 1]
- vj (x, y): veto, expressed on criterion gj , of y on x
2. When the relation S is established, assign any element ai on the basis of the
following rules.
The pessimistic procedure finds the profile for which the element is not the
worst. The optimistic procedure finds the profile against which the element
is surely the worse. If the optimistic and pessimistic assignments coincide,
then no uncertainty exists for the assignment. Otherwise, an uncertainty
exists and should be considered by the user.
In order to better understand how the procedure works consider the following
example.
Three alternatives:
a1 = hD, B, B, Bi, a2 = hB, C, A, Ai, a3 = hA, B, B, Ci.
Appendix B
The complete list of the attributes used in the evaluation model
1 LAND-BASE MANAGEMENT
2 GEOMARKETING
3.2.1 Availability
3.2.2 Adequacy
3.2.2.1 Planes analysis functions
3.2.2.2 Topological connectivity functions
3.2.2.3 Graphical rendering functions
3.2.2.4 Network schema creation
3.3 Development environment
3.3.1 Libraries personalisation
3.3.2 Development support tools
3.3.3 Debugging support
3.3.4 Code documentation
3.3.4.1 Documentation support tools
3.3.4.2 Code browsing
3.3.5 Documentation Quality
3.3.5.1 Completeness
3.3.5.2 Documentation support type
3.3.5.3 Information retrieval ease
3.3.5.4 Contextual help
3.4 Administration tools
3.4.1 User administration functions
3.4.2 Software configuration management
3.4.3 Performance data collection
3.5 Work flow connection
3.6 Interoperability
3.7 Integration between this process products and the Spatial Data Manager
3.7.1 Vectorial data products integration
3.7.2 Descriptive data products integration
3.7.3 Raster data products integration
3.7.4 Digital Terrain Model products integration
3.8 Integration among this process products
3.8.1 Interfaces integration
3.8.2 Data sharing
4.2.2 Adequacy
4.2.2.1 Planes analysis functions
4.2.2.2 Topological connectivity functions
4.2.2.3 Graphical rendering functions
4.2.2.4 Network schema creation
4.3 Development environment
4.3.1 Libraries personalisation
4.3.2 Development support tools
4.3.3 Debugging support
4.3.4 Code documentation
4.3.4.1 Documentation support tools
4.3.4.2 Code browsing
4.3.5 Documentation Quality
4.3.5.1 Completeness
4.3.5.2 Documentation support type
4.3.5.3 Information retrieval ease
4.3.5.4 Contextual help
4.4 Administration tools
4.4.1 Software configuration management
4.4.2 Performance data collection
4.5 Interoperability
4.6 Integration between this process products and the Spatial Data Manager
4.6.1 Vectorial data products integration
4.6.2 Descriptive data products integration
4.6.3 Raster data products integration
4.7 Integration among this process products
4.7.1 Interfaces integration
4.7.2 Data sharing
6 SOFTWARE QUALITY
6.1 Robustness
6.2 Maturity
6.3 Easiness of installation and maintenance
7 PERFORMANCES
237
238 CHAPTER 10. CONCLUSION
have been significantly different depending on the grading policy and/or correction
habits of some teachers, the fact that his exams were corrected late at night or on
the way his various grades were aggregated.
Things are going well since the well-being index in our country raised by
more than 10% over the last three years
Statisticians have elaborated an incredible number of indicators or indices aim-
ing at capturing many aspects of reality (including the quality of the air we breeze,
the richness of a country, its state of development, etc.) by using numbers. Not
only are our newspapers full of these kinds of figures but they are also routinely
used to make important political or economic decisions. In chapter 4, we saw that
such measures should not be confounded with the familiar measurement oper-
ations in Physics. The resulting numbers do not appear to be measured on some
well-defined type of scale. Their properties are sometimes intriguing and they
surely should be manipulated with care. Therefore, claiming that the well-being
index has increased by 10% gives, at best, a very crude indication.
Calculations show that it is not profitable to equip this hospital with a mater-
nity department
The quality of the roads on which we drive, the tariffing of public transporta-
tion, the way our electricity is produced, the safety regulations applied to factories
near our homes, the quality of our social security system, etc., depend on partic-
ular ways of assessing and summarising the costs and the benefits of alternative
projects. Cost-benefit analysis evaluates such projects using money as a yardstick.
This raises many difficulties outside simple cases: how to convert the various
consequences of a complex project into monetary units, how to cope with equity
considerations in the distribution of costs and benefits, how to take the distribution
in time of these consequences into account? In chapter 5 we saw that cost-benefit
analysis can hardly claim to always solve all these difficulties in a satisfactory
manner. Therefore, the apparently objective calculations invoked to refuse the
creation of a maternity department in our hospital, are highly dependent on nu-
merous debatable hypotheses (e.g. the pricing of a number of statistical delivery
incidents due to a longer transportation time for some mothers). It is not unlikely
that other reasonable hypotheses may have led to an opposite decision.
Based on numerous tests it appears that the best buy is car Z
How to take several, generally conflicting, criteria into account when making
a decision ? This area, known as Multiple Criteria Decision Making (MCDM)
is the subject of chapter 6. We showed that, in most cases, the analyst has the
choice between several aggregation strategies that could lead to different results.
Furthermore, apparently familiar concepts, like the importance of criteria, are
shown to have little (if any) clear meaning outside a well-defined aggregation
strategy. Each of these strategies requires the assessment of more or less rich
and precise inter-criteria information. Since such assessments shape preference
information as much as they collect it, the comparison of these strategies raises
many problems. Therefore, because each potential buyer has his own preferences
and interests and there are many different and yet reasonable ways to aggregate
them, the very notion of a best buy is highly debatable.
10.2. WHAT HAVE WE LEARNED? 239
Relax, our new camera will choose the optimal focus for you
Our washing machines, our cameras, our TV sets often take decisions on their
own, e.g. concerning the amount of water or energy to use, the right focus, the,
supposedly optimal tuning of channels, the clarity of an image. The decision
modules underlying such automatic decisions were studied in chapter 7. We saw
that they are based on concepts and techniques that are very similar to the ones
examined in chapter 6 and, thus, raise similar problems and questions. Contrary
to the situation in chapter 6 however, they are used in real time without human
intervention after the implementation stage. This raises new difficulties and issues.
Therefore, relying on the automatic decisions taken by the new camera might not
always be your best option.
Given what you told me about your preferences and beliefs, you should not
invest in this project in view of its expected utility
Standard decision analysis techniques (see e.g. Raiffa 1970) are often seen as
synonymous with decision support methods in risky and/or uncertain situations.
Using a real example in electricity production planning, in chapter 8, we showed
why the implementation of these standard techniques may not be as straightfor-
ward as is often believed. Besides possible computational problems, the assessment
and revision of (subjective) probability distributions in highly ambiguous environ-
ments and in situations involving a long period of time, is an enormous task.
Alternative tools, such as possibilities, belief functions, fuzzy sets and other
kinds of non-additive uncertainty measures may appear as good contenders al-
though their theoretical basis may be seen as less firm than the one underlying
standard Bayesian analysis. Furthermore, important considerations, like the dy-
namic consistency of choices and the aggregation of consequences over time were
shown to be largely open questions. Therefore, there might be more than one
way to assess preferences and beliefs and to combine them in order to make a
recommendation.
Whether we like it or not, it seems difficult nowadays to escape from formal
decision and evaluation methods. We may ignore them. The authors of this book
believe that it may be interesting and profitable to give them a closer look. The
real case-study presented in chapter 9 has shown that their proper use can have a
significant impact on real complex decision or evaluation processes.
Collecting data
All models imply collecting and assessing data of various types and
qualities and manipulating these data in order to derive conclusions that
will hopefully be useful in a decision or evaluation process. This more or
less inevitably implies building evaluation models trying to capture
aspects of reality that are difficult to define with great precision (see
chapters 3, 4, 6 and 9).
The numbers resulting from such evaluation models often appear as
constructs that are the result of multiple options. The choice between
these various possible options is only partly guided by scientific con-
siderations. These numbers should not be confounded with numbers
resulting from classical measurement operations in Physics. They are
measured on scales that are difficult to characterise properly. Further-
more, they are often plagued with imprecision, ambiguity and/or un-
certainty. Therefore, more often than not, these numbers seem, at best,
to give an order of magnitude of what is intended to be captured (see
chapters 3, 4, 6, 8).
The properties of the numbers manipulated in such models should be
examined with care; using numbers may only be a matter of con-
venience and does not imply that any operation can be meaningfully
performed on them (see chapters 3, 4, 6 and 7).
The use of evaluation models greatly contributes to shaping and trans-
forming the reality that we would like to measure. Implementing a
decision/evaluation model only rarely implies capturing aspects of re-
ality that can be considered as independent of the model (see chapters
6 and 9).
Aggregating evaluations
10.2. WHAT HAVE WE LEARNED? 241
We saw that the methods reviewed in chapters 2 to 8 are far from being without
problems. Indeed these chapters can be seen as a collection of the defects of these
methods. Some readers may think that, faced with such evidence, this type of
method should be abandoned and that intuition or expertise are not likely to
do much worse, at lower cost and with less effort. In our opinion, this would be a
totally unwarranted conclusion. It is the firm belief and conviction of the authors
242 CHAPTER 10. CONCLUSION
that the use of formal decision and evaluation tools is both inevitable and useful.
Three main arguments can be proposed to support this claim.
First, it should not be forgotten that formal tools lend themselves more easily
to criticism and close examination than other kinds of tools. However, whenever
intuition or expertise has been subjected to close scrutiny, it has been more or
less always shown that such types of judgements are based on heuristics that are
likely to neglect important aspects of the situation and/or are affected by many
biases (see the syntheses of Kahneman, Slovic and Tversky 1981, Bazerman 1990,
Russo and Schoemaker 1989, Hogarth 1987, Poulton 1994, Thaler 1991)
Second, formal methods have a number of advantages that often prove crucial
in complex organisational and/or social processes:
Although these advantages may have little weight compared to the obvious draw-
backs of formal methods in terms of effort involved, money and time consumed
in some situations (e.g. a very simple decision/evaluation process involving a sin-
gle actor) they appear to us fundamental to us in most social or organisational
processes (see chapter 9).
Third, casual observation suggests that there is an increasing demand for such
tools in various domains (going from executive information systems, decision sup-
port systems and expert systems to standardised evaluation tests and impact stud-
ies). It is our belief that the introduction of such tools may have quite a beneficial
impact in many areas in which they are not commonly used. Although many com-
panies use tools such as graphology and/or astrology in order to select between
applicants for a given position, we are more than inclined to say that the use of
more formal methods could improve such selection processes (let alone on issues
such as fairness and equity) in a significant way. Similarly, the introduction of
more formal evaluation tools in the evaluation of public policies, laws and regu-
lations (e.g. policy against crime and drugs, policy towards the carrying of guns,
fiscal policy, the establishment of environmental standards, etc.), an area in which
they are strikingly absent in many countries, would surely contribute to a more
transparent and effective government.
We would thus answer a clear and definite yes to the question of whether
formal decision and evaluation tools are useful.
10.3. WHAT CAN BE EXPECTED? 243
the engineering route that amounts to saying that a method is good because
it works, i.e. has been applied several times in real-world problems and
has been well accepted by the actors in the process. Although we would
definitely not favour a method that would be unable to pass such a test, we
doubt that the engineering argument is sufficient to define what would dis-
tinguish good formal decision or evaluation methods. First, it is important
to remember that the quality of the support provided by a formal tool is
very difficult to separate from considerations linked to the implementation of
the method. As should be apparent from of chapter 9, the formal tools used
by an analyst are implemented in decision or evaluation processes that may
be highly complex (involving many different actors, lasting a long time and
being governed by complex rules and/or regulations). The resulting deci-
sion/evaluation aid process is therefore conditioned by many factors outside
the realm of a formal method: the quality of the structuration of the prob-
lem, of communication with stakeholders, the availability of user-friendly
softwares, the timing and costs of the study, etc. are elements of utmost
importance in the quality of a decision/evaluation aid process. Supporting
a decision or an evaluation process should not be confounded with solving
a well-defined formal problem. Although it may make sense to associate
a good method for solving it to such a problem, supporting real decision
and evaluation processes should not be confounded with this formal exer-
cise. Second, in practice, it is often difficult to know whether the proposed
model worked or not. Even though the final decision is at variance with
the recommendations derived from the model, the very presence of analysts,
the questions they raised, the type of reasoning they have promoted could
have had a significant impact on the decision process. Should we say then
that the method has worked or not?
A close variant of the engineering route could be called the naive route.
244 CHAPTER 10. CONCLUSION
Analysts implementing formal decision and evaluation tools are in a position sim-
ilar to that of an engineer. Contrary to most engineers, however, these decision
engineers often lack clear criteria for appreciating the success or failure of
their models.
At this point it should be apparent that research on formal decision and evalu-
ation methods should not be guided by the hope of discovering models that would
be ideal under certain types of circumstances. Can something be done then? In
view of the many difficulties encountered with the models envisaged in this book
and the many fields in which no formal decision and evaluation tools are used, we
do think that this area will be rich and fertile for future research.
10.3. WHAT CAN BE EXPECTED? 245
Freed from the idea that we will discover the method, we can, more modestly
and more realistically, expect to move towards:
flexible preference models able to cope with data of poor or unknown quality,
conflicting or lacking information;
assessment protocols and technologies able to cope with complex and unsta-
ble preferences, uncertain trade-offs, hesitation and learning;
tools for comparing aggregation models in order to know what they have
in common and whether one is likely to be more appropriate in view of the
quality of the data?
aggregation models,
If we managed to convince you that formal decision and evaluation models are an
important topic and that the hope of discovering ideal methods is somewhat
chimerical, it is not unlikely that you will find the next book valuable.
Bibliography
[1] Abbas, M., Pirlot, M. and Vincke, Ph. (1996). Preference structures and co-
comparability graphs, Journal of Multicriteria Decision Analysis 5: 8198.
[2] Abdellaoui, M. and Munier, B. (1994). The closing in method: An ex-
perimental tool to investigate individual choice patterns under risk, in
B. Munier and M.J. Machina (eds), Models and experiments in risk and
rationality, Kluwer, Dordrecht, pp. 141155.
[3] Adler, H.A. (1987). Economic appraisal of transport projects: A manual with
case studies, Johns Hopkins University Press for the World Bank, Balti-
more.
[4] Airaisian, P.W. (1991). Classroom assessment, McGraw-Hill, New York.
[5] Allais, M. and Hagen, O. (eds) (1979). Expected utility hypotheses and the
Allais paradox, D. Reidel, Dordrecht.
[6] Allais, M. (1953). Le comportement de lhomme rationnel devant le risque :
Critique des postulats et axiomes de lecole americaine, Econometrica
21: 50346.
[7] Armstrong, W.E. (1939). The determinateness of the utility function, The
Economic Journal 49: 453467.
[8] Arrow, K.J. and Raynaud, H. (1986). Social choice and multicriterion
decision-making, MIT Press, Cambridge.
[9] Arrow, K.J. (1963). Social choice and individual values, 2nd edn, Wiley, New
York.
[10] Atkinson, A.B. (1970). On the measurement of inequality, Journal of Eco-
nomic Theory 2: 244263.
[11] Baldwin, J.F. (1979). A new approach to approximate reasoning using a fuzzy
logic, Fuzzy Sets and Systems 2: 309325.
[12] Balinski, M.L. and Young, H.P. (1982). Fair representation, Yale University
Press, New Haven.
[13] Bana e Costa, C.A., Ensslin, L., Correa, E.C. and Vansnick, J.-C. (1999).
Decision support systems in action: Integrated application in a multi-
criteria decision aid process, European Journal of Operational Research
113: 315335.
[14] Barbera, S., Hammond, P. and Seidl, C. (eds) (1998). Handbook of utility
theory, Vol. 1: Principles, Kluwer, Dordrecht.
247
248 BIBLIOGRAPHY
[15] Bartels, R. H.., Beatty, J. C.. and Barsky, B.H.. (1987). An introduction
to Spline for use in computer graphics and geometric Modeling, Morgan
Kaufmann, Los Altos.
[16] Barzilai, J., Cook, W.D. and Golany, B. (1987). Consistent weights for judg-
ments matrices of the relative importance of alternatives, Operations Re-
search Letters 6: 131134.
[17] Bazerman, M.H. (1990). Judgment in managerial decision making, Wiley,
New York.
[18] Bell, D., Raiffa, H. and Tversky, A. (eds) (1988). Decision making: Descrip-
tive, normative and prescriptive interactions, Cambridge University Press,
Cambridge.
[19] Belton, V., Ackermann, F. and Shepherd, I. (1997). Integrated support
from problem structuring through alternative evaluation using COPE and
VISA, Journal of Multi-Criteria Decision Analysis 6: 115130.
[20] Belton, V. and Gear, A.E. (1983). On a shortcoming of Saatys analytic hi-
erarchies, Omega 11: 228230.
[21] Belton, V. (1986). A comparison of the analytic hierarchy process and a simple
multi-attribute value function, European Journal of Operational Research
26: 721.
[22] Bereau, M. and Dubuisson, B. (1991). A fuzzy extended k-nearest neighbor
rule, Fuzzy Sets and Systems 44: 1732.
[23] Bernoulli, D. (1954). Specimen theori nov de mensura sortis, Commen-
tarii Academi Scientiarum Imperialis Petropolitan (5, 175192, 1738),
Econometrica 22: 2336. Translated by L. Sommer.
[24] Bezdek, J., Chuah, S.K. and Leep, D. (1986). Generalised k-nearest neighbor
rules, Fuzzy Sets and Systems 18: 237256.
[25] Blin, M.-J. and Tsoukias, A. (1998). Multicriteria methodology contribution
to the software quality evaluation, Technical report, Cahier du LAMSADE
No 155, Universite Paris-Dauphine, Paris.
[26] Boardman, A. (1996). Cost benefit analysis: Concepts and practices, Prentice-
Hall, New-York.
[27] Boiteux, M. (1994). Transports : Pour un meilleur choix des investissements,
La Documentation Francaise, Paris.
[28] Bonboir, A. (1972). La docimologie, PUF, Paris.
[29] Borda, J.-Ch. (1781). Memoire sur les elections au scrutin, Comptes Rendus
de lAcademie des Sciences. Translated by Alfred de Grazia as Mathe-
matical derivation of an election system, Isis, Vol. 44, pp. 4251.
[30] Bouchon, B. (1995). La logique floue et ses applications, Addison Wesley, New
York.
BIBLIOGRAPHY 249
[47] Carbone, E. and Hey, J.D. (1995). A comparison of the estimates of expected
utility and non-expected utility preference functionals, Geneva Papers on
Risk and Insurance Theory 20: 111133.
[48] Cardinet, J. (1986). Evaluation scolaire et mesure, De Boeck, Brussels.
[49] Chatel, E. (1994). Quest-ce quune note : recherche sur la pluralite des
modes deducation et devaluation, Les Dossiers dEducation et Forma-
tions 47: 183203.
[50] Checkland, P. (1981). Systems thinking, systems practice, Wiley, New York.
[51] Condorcet, M.J.A.N.C., marquis de. (1785). Essai sur lapplication de
lanalyse a la probabilite des decisions rendues a la pluralite des voix, Im-
primerie Royale, Paris.
[52] Cover, T. M. and Hart, P. E. (1967). Nearest neighbor pattern classification,
IEEE, Transactions on Information Theory, IT-13 1: 2127.
[53] Cross, L.H. (1995). Grading students, Technical Report Series EDO-TM-95-5,
ERIC/AE Digest.
[54] Daellenbach, H.G. (1994). Systems and decision making. A management sci-
ence approach, Wiley, New York.
[55] Dasgupta, P.S., Marglin, S. and Sen, A.K. (1972). Guidelines for project eval-
uation, UNIDO, New York.
[56] Dasgupta, P.S. and Pearce, D.W. (1972). Cost-benefit analysis: Theory and
practice, Macmillan, Basingstoke.
[57] Davis, B.G. (1993). Tools for teaching, Jossey-Bass, San Francisco.
[58] de Jongh, A. (1992). Theorie du mesurage, agregation des criteres et appli-
cation au decathlon, Masters thesis, SMG, Universite Libre de Bruxelles,
Brussels.
[59] Dekel, E. (1986). An axiomatic characterization of preference under uncer-
tainty: Weakening the independence axiom, Journal of Economic Theory
40: 304318.
[60] Desrosieres, A. (1995). Refleter ou instituer : Linvention des indicateurs
statistiques, Technical Report 129/J310, INSEE, Paris.
[61] de Ketele, J.-M. (1982). La docimologie, Cabay, Louvain-La-Neuve.
[62] de Landsheere, G. (1980). Evaluation continue et examens. Precis de doci-
mologie, Labor-Nathan, Paris.
[63] Dinwiddy, C. and Teal, F. (1996). Principles of cost-benefit analysis for de-
veloping countries, Cambridge University Press, Cambridge.
[64] Dorfman, R. (1996). Why benefit-cost analysis is widely disregarded and what
to do about it?, Interfaces 26: 16.
[65] Dreze, J. and Stern, N. (1987). The theory of cost-benefit analysis, in
A.J. Auebach and M. Feldstein (eds), Handbook of public economics, El-
sevier, Amsterdam, pp. 909989.
BIBLIOGRAPHY 251
[66] Dubois, D., Fargier, H. and Prade, H. (1997). Decision-making under ordinal
preferences and uncertainty, in D. Geiger and P.P. Shenoy (eds), Proceed-
ings of the 13th conference on uncertainty in artificial intelligence, Morgan
Kaufmann, Los Altos, pp. 157164.
[67] Dubois, D., Prade, H. and Sabbadin, R. (1998). Qualitative decision theory
with Sugeno integrals, Proceedings of the 14t h conference on uncertainty
in artificial intelligence, Morgan Kaufmann, Los Altos, pp. 121128.
[68] Dubois, D., Prade, H. and Ughetto, L. (1999). Fuzzy logic, control engi-
neering and artificial intelligence, in H.B. Verbruggen, H.J. Zimmermann
and R. Babuska (eds), Fuzzy algorithms for control, Kluwer, Dordrecht,
pp. 1758.
[69] Dubois, D. and Prade, H. (1987). The mean value of a fuzzy number, Fuzzy
Sets and Systems 24: 279300.
[70] Dubois, D. and Prade, H. (1988). Possibility theory, Plenum Press, New-York.
[71] Dupuit, J. (1844). De la mesure de lutilite des travaux publics, Annales des
Ponts et Chaussees (8).
[72] Dyer, J.S. (1990). Remarks on the analytic hierarchy process, Management
Science 36: 249258.
[73] Ebel, R.L. and Frisbie, D.A. (1991). Essentials of educational measurement,
Prentice-Hall, New-York.
[74] Ellsberg, D. (1961). Risk, ambiguity and the Savage axioms, Quarterly Jour-
nal of Economics 75: 643669.
[75] Fargier, H. and Perny, P. (1999). Qualitative decision models under uncer-
tainty without the commensurability hypothesis, in K.B.. Laskey and
H. Prade (eds), Proceedings of the 15t h conference on uncertainty in ar-
tificial intelligence, Morgan Kaufmann, Los Altos, pp. 188195.
[76] Farrell, D.M. (1997). Comparing electoral systems, Contemporary Political
Studies, Prentice-Hall, New-York.
[77] Fiammengo, A., Buosi, D., Iob, I., Maffioli, P., Panarotto, G. and Turino, M.
(1997). Bid management of software acquisition for cartography applica-
tions. Presented at AIRO 97 Conference, Aosta.
[78] Fishburn, P.C. and Sarin, R.K. (1991). Dispersive equity and social risk,
Management Science 37: 751769.
[79] Fishburn, P.C. and Sarin, R.K. (1994). Fairness and social risk I: Unaggre-
gated analyses, Management Science 40: 11741188.
[80] Fishburn, P.C. and Straffin, P.D. (1989). Equity considerations in public risks
evaluation, Operations Research 37: 229239.
[81] Fishburn, P.C. (1970). Utility theory for decision-making, Wiley, New York.
[82] Fishburn, P.C. (1976). Noncompensatory preferences, Synthese 33: 393403.
[83] Fishburn, P.C. (1977). Condorcet social choice functions, SIAM Journal on
Applied Mathematics 33: 469489.
252 BIBLIOGRAPHY
[138] Keller, J., Gray, M. and Givens, J. (1985). A fuzzy knearest neighbor
algorithm, IEEE Transactions on Systems Man and Cybernetics. 15: 580
585.
[139] Kelly, J.S. (1991). Social choice bibliography, Social Choice and Welfare
8: 97169.
[140] Kerlinger, F.N. (1986). Foundations of behavioral research, 3rd edn, Holt,
Rinehart and Winston, New York.
[141] Kirkpatrick, C. and Weiss, J. (1996). Cost-benefit analysis and project ap-
praisal in developing countries, Elgar, Adelshot Hants.
[142] Kohli, K.N. (1993). Economic analysis of investment projects: A practical
approach, Oxford University Press for the Asian Development Bank, Ox-
ford.
[143] Krantz, D.H., Luce, R.D., Suppes, P. and Tversky, A. (1971). Foundations of
measurement, Vol. 1: Additive and polynomial representations, Academic
Press, New York.
[144] Krutilla, J.V. and Eckstein, O. (1958). Multiple purpose river development,
Johns Hopkins University Press, Baltimore.
[145] Laska, J.A. and Juarez, T. (1992). Grading and marking in American
schools: Two centuries of debate, Charles C. Thomas, Springfield.
[146] Laslett, R. (1995). The assumptions of cost-benefit analysis, in K.G. Willis
and J.T. Corkindale (eds), Environmental valuation: New perspectives,
CAB International, Oxford, pp. 520.
[147] Lesourne, J. (1975). Cost-benefit analysis and economic theory, North-
Holland, Amsterdam.
[148] Lindheim, E., Morris, L.L. and Fitz-Gibbon, C.T. (1987). How to measure
performance and use tests, Sage Publications, Thousand Oaks.
[149] Little, I.M.D. and Mirlees, J.A. (1968). Manual of industrial project analysis
in developing countries, O.E.C.D, Paris.
[150] Little, I.M.D. and Mirlees, J.A. (1974). Project appraisal and planning for
developing countries, Basic books, New York.
[151] Loomes, G. and Sugden, R. (1982). Regret theory: An alternative theory of
rational choice under uncertainty, Economic Journal 92: 805824.
[152] Loomes, G. (1988). Different experimental procedures for obtaining valua-
tions of risky actions: Implications for utility theory, Theory and Decision
25: 123.
[153] Loomis, J., Peterson, G., Champ, P., Brown, T. and Lucero, B. (1998).
Paired comparisons estimates of willingness to accept and contingent val-
uation estimates of willingness to pay, Journal of Economic Behavior and
Organisation 35: 501515.
[154] Luce, R.D., Krantz, D.H., Suppes, P. and Tversky, A. (1990). Foundations
of measurement, Vol. 3: Representation, axiomatisation and invariance,
Academic Press, New York.
256 BIBLIOGRAPHY
[155] Luce, R.D. and Raiffa, H. (1957). Games and Decisions, Wiley, New York.
[156] Luce, R.D. (1956). Semiorders and a theory of utility discrimination, Econo-
metrica 24: 178191.
[157] Lysne, A. (1984). Grading of students attainement: Purposes and functions,
Scandinavian Journal of Educational Research 28: 149165.
[158] Machina, M.J. (1982). Expected utility without the independence axiom,
Econometrica 50: 277323.
[159] Machina, M.J. (1989). Dynamic consistency and non-expected utility models
of choice under uncertainty, Journal of Economic Literature 27: 1622
1688.
[160] Mamdani, E. H.. (1981). Gaines fuzzy reasonning and its applications, Aca-
demic Press, New York.
[161] Marchant, Th. (1996). Valued relations aggregation with the Borda method,
Journal of Multi-Criteria Decision Analysis 5: 127132.
[162] Masser, I. (1983). The representation of urban planning-processes: An ex-
ploratory review, Environment and Planning B 10: 4762.
[163] May, K.O. (1952). A set of independent necessary and sufficient conditions
for simple majority decisions, Econometrica 20: 680684.
[164] McClennen, E.F. (1990). Rationality and dynamic choice: Foundational ex-
plorations, Cambridge University Press, Cambridge.
[165] McCord, M. and de Neufville, R. (1983). Fundamental deficiency of expected
utility analysis, in S. French, R. Hartley, L.C. Thomas and D.J. White
(eds), Multiobjective decision making, Academic Press, London, pp. 279
305.
[166] McCord, M. and de Neufville, R. (1982). Empirical demonstration that
expected utility decision analysis is not operational, in B. Stigum and
F. Wenstp (eds), Foundations of utility and risk theory, D. Reidel, Dor-
drecht, pp. 181199.
[167] McCrimmon, K.R. and Larsson, S. (1979). Utility theory: Axioms versus
paradoxes, in M. Allais and O. Hagen (eds), Expected utility hypotheses
and the Allais paradox, D. Reidel, pp. 27145.
[168] McLean, J.E. and Lockwood, R.E. (1996). Why and how should we assess
students? The competing measures of student performance, Sage Publica-
tions, Thousand Oaks.
[169] Merle, P. (1996). Levaluation des eleves. Enquete sur le jugement professo-
ral, PUF, Paris.
[170] Mintzberg, H., Raisinghani, D. and Theoret, A. (1976). The structure of un-
structured decision processes, Administrative Science Quarterly 21: 246
272.
[171] Mishan, E. (1982). Cost-benefit analysis, Allen and Unwin, London.
BIBLIOGRAPHY 257
[172] Moom, T.M. (1997). How do you know they know what they know? A hand-
book of helps for grading and evaluating student progress, Grove Publish-
ing, Westminster.
[173] Morisio, M. and Tsoukias, A. (1997). IUSWARE: A formal methodology for
software evaluation and selection, IEE Proceedings on Software Engineer-
ing 144: 162174.
[174] Moscarola, J. (1984). Organizational decision processes and ORASA inter-
vention, in R. Tomlinson and I. Kiss (eds), Rethinking the process of oper-
ational research and systems analysis, Pergamon Press, Oxford, pp. 169
186.
[175] Mousseau, V. (1993). Problemes lies a levaluation de limportance en aide
multicritere a la decision : Reflexions theoriques et experimentations,
PhD thesis, LAMSADE, Universite Paris-Dauphine, Paris.
[176] Munier, B. (1989). New models of decisions under uncertainty, European
Journal of Operational Research 38: 307317.
[177] Nas, T.F. (1996). Cost-benefit analysis: Theory and application, Sage Pub-
lications, Thousand Oaks.
[178] Nauck, D. and Kruse, R. (1999). Neuro-fuzzy methods in fuzzy rule gener-
ation, in D. D. J. Bezdek and H. Prade (eds), Fuzzy sets in approximate
reasoning and information systems, Vol. 3 of Handbook of Fuzzy Sets,
Kluwer, Dordrecht, chapter 5, pp. 305333.
[179] Nau, R.F. and McCardle, K.F. (1991). Arbitrage, rationality and equilib-
rium, Theory and Decision 31: 199240.
[180] Nau, R.F. (1995). Coherent decision analysis with inseparable probabilities
and utilities, Journal of Risk and Uncertainty 10: 7191.
[181] Nguyen, H.T. and Sugeno, M. (1998). Modelling and control, Kluwer, Dor-
drecht.
[182] Nims, J.F. (1990). Poems in translation: Sappho to Valery, The University
of Arkansas Press, Arkansas.
[183] Noizet, G. and Caverini, J.-P. (1978). La psychologie de levaluation scolaire,
PUF, Paris.
[184] Nurmi, H. (1987). Comparing voting systems, D. Reidel, Dordrecht.
[185] Nutt, P.C. (1984). Types of organizational decision processes, Administrative
Science Quarterly 19: 414450.
[186] Nyborg, K. (1998). Some Norwegian politicians use of cost-benefit analysis,
Public Choice 95: 381401.
[187] Ostanello, A. and Tsoukias, A. (1993). An explicative model of public in-
terorganizational interactions, European Journal of Operational Research
70: 6782.
[188] Ostanello, A. (1990). Action evaluation and action structuring Different
decision aid situations reviewed through two actual cases, in C.A. Bana
258 BIBLIOGRAPHY
[221] Satterthwaite, M.A. (1975). Strategy proofness and Arrows conditions: Ex-
istence and correspondence theorems for voting procedures and social wel-
fare functions, Journal of Economic Theory 10: 187217.
[222] Savage, L. (1954). The foundations of statistics, 1972, 2nd revised edn, Wiley,
New York.
[223] Schmeidler, D. (1989). Subjective probability and expected utility without
additivity, Econometrica 57: 571587.
[224] Schneider, Th., Schieber, C., Eeckoudt, L. and Gollier, C. (1997). Eco-
nomics of radiation protection: Equity considerations, Theory and De-
cision 43: 24151.
[225] Schofield, J. (1989). Cost-benefit analysis in urban and regional planning,
Unwin and Hyman, London.
[226] Scotchmer, S. (1985). Hedonic prices and cost-benefit analysis, Journal of
Economic Theory 37: 5575.
[227] Sen, A.K. (1986). Social choice theory, in K.J. Arrow and M.D. Intriliga-
tor (eds), Handbook of mathematical economics, Vol. 3, North-Holland,
Amsterdam, pp. 10731181.
[228] Sen, A.K. (1997). Maximization and the act of choice, Econometrica 65: 745
779.
[229] Simon, H.A. (1957). A behavioural model of rational choice in Models of
man, Wiley, New York, pp. 241260.
[230] Sinn, H.W. (1983). Economic decisions under uncertainty, North-Holland,
Amsterdam.
[231] Slowinski, R. (ed.) (1998). Fuzzy sets in decision analysis, operations research
and statistics, Kluwer, Dordrecht.
[232] Sopher, B. and Gigliotti, G. (1993). A test of generalized expected utility
theory, Theory and Decision 35: 75106.
[233] Speck, B.W. (1998). Grading student writing: An annotated bibliography,
Greenwood Publishing Group, Westport.
[234] Stamelos, I. and Tsoukias, A. (1998). Software evaluation problem situa-
tions, Technical report, Cahier du LAMSADE No 156, Universite Paris-
Dauphine, Paris.
[235] Steuer, R.E. (1986). Multiple criteria optimisation: Theory, computation,
and application, Wiley, New York.
[236] Stratton, R.W., Myers, S.C. and King, R.H. (1994). Faculty behavior, grades
and student evaluations, Journal of Economic Education 25: 515.
[237] Sugden, R. and Wiliams, A. (1983). The principles of practical cost-benefit
analysis, Oxford University Press, Oxford.
[238] Sugeno, M. (1977). Fuzzy measures and fuzzy integrals: a survey, in
M.M. Gupta, G.N. Saridis and B.R. Gains (eds), Fuzzy automata and
decision processes, North Holland, Amsterdam, pp. 89102.
BIBLIOGRAPHY 261
[273] Zerbe, R.O. and Dively, D.D. (1994). Benefit-cost analysis in theory and
practice, Harper Collins, New York.
Index
264
INDEX 265
t-norm, 164
threshold, 50, 173
tournament, 125
trade-off, 101
transitivity, 1820
transportation, 71, 79
unanimity, 13