You are on page 1of 9

Insights from practice

Dumbing down performance measures


Paul Walsh

Paul Walsh is Senior Lecturer at Summary


the Australian Graduate School Purpose – The purpose of this paper is to examine why and how firms adopt less-than-perfect
of Management, University of surrogate measures and, in extreme cases, dumb down measures to support strategic scorecards.
NSW, Sydney, Australia. Design/methodology/approach – The paper presents a taxonomy to classify the different types of
surrogate measures and workarounds are suggested for evaluating performance when surrogate
measures are present.
Findings – The use of low investment surrogate measures is tempting when firms face the prospect of
measuring strategic objectives, especially around intangible assets. Current approaches tend to be ad
hoc and would benefit from being more systematic.
Practical implications – The approach enables firms to recognise the potential for dumb measures,
suggest workarounds and improve current practice for managing with less relevant measures.
Originality/value – The taxonomy and workarounds presented provide greater rigour, insight and
legitimacy to the use of surrogate measures in scorecards.
Keywords Performance measures, Quality
Paper type Research paper

Introduction
A call centre measures customer satisfaction only by the time to answer the phone; a
marketing manager measures the depth of relationship with channel partners by the
number of meetings held; a company measures innovation by the percentage of
revenue from products less than 12 months old; a government agency measures the
competency of its workforce by the ratio of consultant spend to salary spend; a
university measures its reputation by the ratio of centimetres of positive to negative
press it receives. What do all these examples have in common? They are all examples
of dumbing down performance measures, an emerging practice due to the popularity of
scorecards and dashboards.
This paper has several aims. First, it suggests there is a practical necessity for
less-than-perfect measures (dumb measures being an extreme) as firms move more
towards the measurement of intangible assets and adopt strategy-led measures. Second,
this paper classifies measures in terms of a four-level taxonomy adapted from Simons
(2000), so that firms can assess the extent to which their measures match strategic
objectives. Third, it discusses workarounds that firms can employ when faced with the reality
of managing performance with less-than-perfect measures.
This paper refers to dumbing down measures as the practice of repeatedly substituting
measures of achievement with less and less relevant surrogate measures until what remains
is an activity or initiative measure, not a measure of outcomes achieved. The upside of
surrogate measures is that for time poor managers, low maintenance measures, which can
be easily sourced from transaction systems, replace high maintenance measures, which

DOI 10.1108/13683040510634826 VOL. 9 NO. 4 2005, pp. 37-45, Q Emerald Group Publishing Limited, ISSN 1368-3047 j MEASURING BUSINESS EXCELLENCE j PAGE 37
require survey instruments. For example, a manager decides to measure employee
satisfaction by absenteeism levels rather than undertaking a resource-intensive employee
survey. Considerable time and resources are saved. The downside for the firm, however, is
that the limited interpretation and analysis associated with dumb measures both restricts
opportunities for strategic conversations based on sensible data and sends a signal that
corporate performance management is more about getting some numbers to report rather
than helping the business change and improve. For example, slicing and dicing the
absenteeism profile, while it may pinpoint staff that need counselling, tells the firm little about
the true drivers of employee satisfaction that a survey analysis would reveal.
Dumb measures retard the development of a high performance culture by discouraging the
use of measures that are highly relevant, rich in content and encourage inquiry and business
improvement. Instead staff attempt to find what can be easily measured. Undesirable
behaviours may be exacerbated when less-than-perfect measures are linked with
performance-related pay. For example, managers can be rewarded for keeping
absenteeism levels low but employee satisfaction goes unnoticed. When dumb measures
are present, instead of measuring achievements against goals, staff focus on completing
activities and controlling inputs and financial resources. Hope and Fraser (2003) discuss the
narrow behaviour that managers can exhibit if their performance is assessed on making
budget alone.
Dumb measures are also associated with the green effect, where there is a reluctance to
show traffic lights other than green on dashboards. To create the green effect, staff report
measures of busyness that are fully under their control rather than measures of achievement.
Alternatively, they create low expectations when setting targets because in both cases there
is a fear of being judged as a poor performer.

Strategy-led performance measurement systems often fail to deliver on their promises


because of operational difficulties (Krause, 2003). In particular, there are a number of
practical difficulties when measuring intangible assets including the depth of customer
relationships and the strategic readiness of human, organisational and information capital
(Kaplan and Norton, 2004). Credibility can be an issue. For example, customer satisfaction
ratings on the quality of a product or service may decrease if a recent advertising campaign
has raised customer expectations. Also, surveys are always restricted to what is asked and
who returns them and these issues can be hotly debated when ratings are not as good as
they should be. Neely et al. (2002) describe ten tests that identify specific strengths and
weaknesses of an individual measure.

Kaplan and Norton (2004, p. xii) suggest that balanced scorecards are effective in the
presence of less-than-perfect measures ‘‘. . . when agreement existed about the objective to
be achieved, even if the initial measurements for the objective turned out to be less than
perfect, the executives could easily modify the measurements for subsequent periods,
without having to redo their discussion about strategy. The objectives would likely remain the
same even as the measurements of the objectives evolved with experience and new data
sources’’. Despite this claim, many firms using strategy-led scorecards describe the
performance measurement aspect as only adequate (Frigo and Krumweide, 1999). The key
issue is one of relevance of the measures (Andon et al., 2005).
While firms may seek more relevant and thus more perfect measures, there are at least six
major constraints that reduce their capacity to do so:
1. Although information rich, surveys can be high maintenance, high cost instruments. The
data take a lot of effort to gather, an unbiased sampling scheme needs to be devised and
questions carefully crafted. Firms also need to be aware of survey fatigue (Tate, 2000).
2. For intangible assets, a related issue is that the data may be too infrequent for
management decision making. Consider an annual customer or employee survey.
Managers cannot wait for a year to make decisions about the level of customer or
employee satisfaction.

j j
PAGE 38 MEASURING BUSINESS EXCELLENCE VOL. 9 NO. 4 2005
3. Measurement of the three categories of intangible assets (human, structural and
relational capital) is often complex and multi-dimensional. Measures often require
significant investment to develop a reliable instrument (Sanchez et al., 2000).
4. The firm does not want to record the data because it indicates an undesirable event has
occurred; for example fraud, crime, accident, fine, litigious complaint. The challenge is to
find early warning indicators.
5. Most people do not want to be assessed on measures that are outside their control.
Apportioning the right level of accountability for a measure can prove an elusive issue. For
example, suppose Sales have difficulty selling products that have been poorly designed
and manufactured. How much should Research and Development (R&D) and
Manufacturing be held accountable because Sales miss their targets? This is the issue
of dependency.
6. The firm does not have integrated transaction systems to automate data collection. Since
data must be gathered manually, reporting soon becomes a burden and the firm is forced
to simplify the measures on its scorecard (Van Der Zee and De Jong, 1999).

Properties of measures
The first step in assessing the adequacy of surrogate measures and recognising the
potential for dumb measures is to classify the types of measures scorecards can contain.
Simons (2000) classified the properties of measures in terms of objectivity, completeness
and responsiveness.
An objective measure is one that can be independently verified. Examples include financial
measures such as income, expenses and asset valuation. Objectivity is provided by
expressions such as ‘‘true and fair view’’ in audited accounts and standards such as globally
accepted accounting principles (GAAP). Objective non-financial measures include
volumes, cycle time and waste.
A subjective measure is one that cannot be independently verified because there is a
reliance on personal judgement; for example, the rating of a research report on a scale of 1
to 10, where 10 is excellent and 1 is very poor. Subjective measures are common in
performance appraisals where superiors rate subordinates. Rating bias can be reduced by
using multiple raters.
While staff can have confidence in the validity of objective measures, subjective measures
require high levels of trust if they are to motivate people. Subordinates must believe that the
rating scale and judgement are fair and reasonable. There is subjectivity in measures of
intangible assets and the more subjective a measure is, the less-than-perfect it is.
A complete measure captures all the relevant attributes that define performance. For
example, the cost of poor quality is properly defined in terms of prevention, inspection and
failure costs. Absence of any one of the components will make measurement of the cost of
poor quality incomplete. The more incomplete a measure is, the less-than-perfect it is.
A measure is responsive to the extent that a manager can take action to influence it.
Responsiveness is related to controllability. Share price is considered a complete measure
because all managerial actions eventually end up being reflected in the share price. Share
price, however, can be affected by interest rates and general investment sentiment, which
are outside the control of managers. Share price therefore is not fully responsive to
management actions. The less responsive a measure is, the less-than-perfect it is, and the
more difficult it is to attribute improvements in performance.

Taxonomy for classifying measures


Building on Simons (2000) definition, we define a hierarchy of four levels of measures, shown
Figure 1. The hierarchy starts with identifying the strategic objective. As we move down the
hierarchy each measure becomes progressively more of a leading indicator of performance.
The implications are that for any strategic objective, it is possible to construct a family of
leading and lagging indicators to measure progress and achievement. The hierarchy is

j j
VOL. 9 NO. 4 2005 MEASURING BUSINESS EXCELLENCE PAGE 39
Figure 1 The measures hierarchy of four levels

deployed in balanced scorecard applications at BE Campbell, the Australian pork producer


and distributor, the New South Wales Parliament and the Central Bank of Indonesia.

An example of where the strategic objective is customer satisfaction is shown in Figure 2.


Exact measures are complete in that they capture all relevant attributes of performance as
defined by the intent and wording of the strategic objective. They are an exact match with the
intended outcome. For example, a properly researched and constructed customer
satisfaction index is an exact metric for measuring the objective of satisfying customers. The
difficultly with exact measures arises when they address intangible assets and the data are
subjective and open to interpretation by sceptics.

Figure 2 Example ‘‘improve service quality’’

j j
PAGE 40 MEASURING BUSINESS EXCELLENCE VOL. 9 NO. 4 2005
Proxy measures are ‘‘next best’’ or surrogate measures, and are incomplete in that they
possess only a limited set of the attributes needed to define performance. Proxy measures
are used in scorecards because the data are easier to collect, less open to interpretation,
timelier and more objective than exact measures. For example, a customer satisfaction
survey measured once a year can be supported by proxy measures such as customer
complaints, contract cancellations or customer churn. The ideal is to create a family of proxy
measures covering each attribute that defines performance. For example, pioneering work
on service quality led by Parasuraman et al. (1991) led to the adoption of the RATER scale for
measuring reliability, assurance, tangibles, empathy and responsiveness. A minimum of five
proxies would be needed to cover the five attributes of performance.
The downside of proxy measures is that in the absence of exact measures, people can
emphasise the wrong things. For example, all attention is paid to customer complaints and
little to the broad range of value drivers that influence customer satisfaction because the firm
has never conducted a customer satisfaction survey. Proxy measures, without exact
measures to complement them, begin the dumbing down process. If a family of proxy
measures is used and each addresses a different attribute of performance, the less need
there is for an exact measure and dumbing down is not an issue.
Process measures are measures of activity, outputs and resources consumed. At best they
measure busyness not achievement. Process measures, in particular cost, time and
quantity, are relatively easy to measure because they are sourced from transaction and
finance systems. The presence of process measures, without complementary proxy and
exact measures is an example of dumbing down strategic-level scorecards. The reason is
almost certainly that process measures, being internal measures, are more responsive to
management actions than exact and proxy measures where external factors may be
present. If a blame culture exists, then people will prefer to be judged on measures that are
more under their control. Process measures are useful for line managers for operational
control but have little relevance for strategic control.
Initiatives are designed to deliver improvements in a firm’s people, systems and processes;
for example a new computer system, a training course, a reengineering project, or a risk
management system. Initiative measures, when organised in terms of project milestones,
are reported in terms of on time, on budget and on specification. Except when the firm is a
project-based organisation, initiative measures, without complementary process, proxy and
exact measures, are best removed from strategic-level scorecards and reported separately
on an initiative or project status scorecard.

Examples of less-than-perfect measures

Example 1
Niven (2002, p. 209) provides examples of functional scorecards where process measures
are used to gauge outcomes (see Table I).

Example 2
A government-owned water corporation in Australia measures organisational alignment with
one exact and two process measures (see Table II).

Table I
Unit Objective Measure Measure type

Customer service and marketing Increase customer loyalty: move beyond Number of redesigned customer processes Process
‘‘satisfied’’ to ‘‘loyal’’ customers and services
Information technology Provide effective desktop support Number of desktop service requests Process
completed

j j
VOL. 9 NO. 4 2005 MEASURING BUSINESS EXCELLENCE PAGE 41
Table II
Objective Measure Frequency Measure type

Ensure that direction is understood Employee perception index Annual Exact


Team briefs Monthly Process
Performance agreements in place Annual Process

Workarounds
If we accept that the principal aim of strategic scorecards is to engage senior managers in
strategic conversation using the best available data, less-than-perfect measures,
particularly for intangible assets, play an important role in supporting change. Three
workarounds are proposed to address the limitations of less-than-perfect measures.

Workaround 1: identify the type of measure on the scorecard


Scorecards as a minimum contain columns of perspectives, objectives, measures and
targets. An extra column can be added to identify the type of measure; exact, proxy, process
and initiative. Any shortcoming that exists between the objective and the measure(s) can be
highlighted with this simple adjustment. In this way the focus can move from a narrow
discussion on less-than-perfect data to a discussion around additional evidence on
achievement of the objective. The first workaround is a timely reminder that managers
should always focus on the objective when taking corrective action and evaluating staff
performance, especially when less-than-perfect measures are present (Austin, 1996).

Workaround 2: report by strategic theme


Kaplan and Norton (2001) coin the term strategic theme to denote a focused strategic
priority that offers greater granularity of the firm’s strategy. A strategic theme is a causal
pathway of linked strategic objectives on a strategy map that describes a direction a firm
wishes to strengthen; for example, improve customer experience through increasing
customer service training. Strategic themes are best as statements of means and ends,
covering the intended results (for example, revenue growth) and the corresponding internal
value drivers (for example, a diversified product range and a more skilful salesforce). Kaplan
and Norton (2001) proposed four generic strategic themes; build the franchise, increase
customer value, achieve operational excellence and be a good corporate citizen. Westpac,
one of Australia’s largest banks, currently uses this approach throughout its Corporate
Services Division.
Firms have an opportunity to significantly alter the format of their management reports when
they arrange their strategy in terms of strategic themes. A single aggregate scorecard with
20-25 measures can be replaced with smaller scorecards containing six to eight measures,
one for each strategic theme. In this way, clutter is reduced and discussion centres on a
small set of highly correlated measures for each theme. While it is desirable that all measures
are exact within each theme, there is scope for accommodating some less-than-perfect
measures. The reason is that the aim of a theme-based scorecard is to test and validate the
progress of a key strategic priority by checking for a mutually reinforcing pattern across a
group of measures, rather than focus on the weakness of any one particular measure. This
approach is currently being trialled at the University of Sydney, where scorecards around
four strategic themes (research and innovation, learning and teaching, financial
sustainability and community outreach and engagement) are being implemented.
This second workaround is also consistent with the view of Neely (1998). He argues that
measures perform three roles; comply, check and challenge. The challenge role encourages
managers to test and validate the assumptions that underpin firm strategy. Building the
management report around a small set of strategic themes facilitates more focused testing
and validation. The check and comply roles can then be satisfied by beginning the
management report with a scorecard of six to eight measures built around stakeholder
perspectives. A stakeholder scorecard provides an enterprise-level health check and builds

j j
PAGE 42 MEASURING BUSINESS EXCELLENCE VOL. 9 NO. 4 2005
on the need to measure stakeholder relationships (Neely et al., 2002). This approach is
currently in use at the Department for Correctional Services (South Australia).

Workaround 3: rule-based decision tree


In the presence of less-than-perfect measures, it is important for managers to recognise the
gap between the chosen measure(s) and the objective and exercise discretion. Simons
(2000) provides a framework of decision rules, which has been adapted for the context of
managing with less-than-perfect measures (see Figure 3). The rules are not ‘‘go-no go’’
decisions but should be interpreted as providing guidance for managing performance with
measures that have varying degrees of objectivity, completeness and responsiveness. The
rules provide recommendations for two measurement situations. The first addresses
circumstances where a less-than-perfect measure will suffice provided there is a climate of
trust and integrity. The second addresses how best to complement existing measures when
they are lacking in completeness and responsiveness.
Several examples illustrate the use of the decision tree to effectively evaluate performance
when faced with less-than-perfect measures.

Example 1
Suppose a scorecard contains a measure of the quality of research reports, on a rating scale
0-100, from ‘‘below expectations’’ to ‘‘exceeds expectations’’, the higher the score the better
the report. No other criteria are present. Can such a subjective measure be used by a
manager to evaluate staff performance with integrity? The decision tree suggests such a
simple rating scale would be useful provided there is a high degree of trust between the
manager and report authors. This is not to say, objective criteria and multiple raters should
not be encouraged it just means that in the absence of a greater investment in detailed
criteria, a simple rating scale can still be effective provided the organisational climate is one
of trust and openness.

Example 2
A scorecard contains an objective to increase innovation in the new product development
process. If the only measure was the number of new products developed, this would be a

Figure 3 Measures decision tree

j j
VOL. 9 NO. 4 2005 MEASURING BUSINESS EXCELLENCE PAGE 43
process measure that was fully controllable. The decision tree alerts the manager to the
danger that the measure may drive the wrong behaviour when a culture of fear and blame
exists as staff seek to churn out more and more new products without revenue growth. If, as
General Electic and Hewlett-Packard do, the process measure is replaced by a (less
controllable) proxy measure – the percentage of revenue from products less than 12 months
old, dysfunctional behaviour can still occur from the sales team if they seek to cannibalise
the sales of mature products. In both cases, a climate of trust must be present if gaming is to
be avoided.

Example 3
At Westpac, the financial crimes investigation unit considered measuring the number of
arrests on its scorecard. The decision tree recommended that since this exact measure is
only partially controllable, there would be issues of attribution, since the efforts of outside law
enforcement authorities have a substantial impact on arrest rates. The decision tree
recommended a proxy measure should complement arrest rate, which was in Westpac’s
case, the percentage of evidence briefs that conformed to the Evidence Act.

Example 4
The sales of imported manufactured goods are impacted by currency exchange rates,
which are largely outside the control of a single firm. Should exchange rates appear on
scorecards? The decision tree recommends that if they do appear, their purpose should be
restricted to monitoring the competitive environment not staff appraisal.

Conclusion
Although measurement is regarded as an inexact science, this has not deterred the
increasing adoption of scorecards in two contexts. First where the challenge is to clarify the
firm’s strategic goals before selecting relevant measures such as in the case of the Balanced
Scorecard and Performance Prism, or second where the goals are assumed and managers
will introduce scorecards to kick start feedback and learning processes. In both contexts,
relevance of measures is an issue and the grim reality is that on occasions measures will be
less-than-perfect. For many firms without significant measurement resources, including
small to medium enterprises, comprises will need to be reached especially in the case of
measuring intangible assets. For these firms, the use of surrogate low maintenance
measures is appealing. This paper has provided a pragmatic approach to the sensible use
of surrogate measures while recognising and cautioning against the practice of dumbing
down measures.
Much research is being carried out to develop best practice measures of intangible assets
(see for example, the special issue of Measuring Business Excellence, Vol. 8 No. 1, 2004).
Yet the issue for many firms will be consideration of the investment in resources needed to
adopt the best practice measures versus the benefits obtained. This paper suggests that a
complementary research agenda would be one that investigates the level of comprise that is
best practice not for exact measures, but for surrogate measures. The study would include
firms which acknowledge the importance of nurturing intangible assets but who are not
prepared to make the necessary investment in measurement. The research would lead to
debate on the merits of ‘‘satisficing versus optimising’’ measures.

References
Andon, P., Baxter, J. and Mahama, H. (2005), ‘‘The balanced scorecard: slogans, seduction, and state of
play’’, Australian Accounting Review, Vol. 15 No. 1, pp. 29-38.
Austin, R. (1996), Measuring and Managing Performance in Organisations, Dorsett House, New York,
NY.
Frigo, M. and Krumweide, K. (1999), ‘‘Balanced scorecard: a rising trend in strategic performance
measurement’’, Journal of Strategic Performance Measurement, Vol. 3 No. 1, pp. 42-4.

Hope, J. and Fraser, R. (2003), Beyond Budgeting: How Managers Can Break Free from the Annual
Performance Trap, Harvard Business School Press, Boston, MA.

j j
PAGE 44 MEASURING BUSINESS EXCELLENCE VOL. 9 NO. 4 2005
Kaplan, R. and Norton, D. (2001), The Strategy-focused Organisation, Harvard Business School Press,
Boston, MA.
Kaplan, R. and Norton, D. (2004), Strategy Maps: Converting Intangible Assets into Tangible Outcomes,
Harvard Business School Press, Boston, MA.
Krause, O. (2003), ‘‘Beyond BSC: a process based approach to performance management’’, Measuring
Business Excellence, Vol. 6 No. 3, pp. 4-14.
Neely, A. (1998), Measuring Business Performance, The Economist Books, London.
Neely, A., Adams, C. and Kennerley, M. (2002), The Performance Prism: The Scorecard for Measuring
and Managing Business Success, Prentice-Hall, London.
Niven, P. (2002), Balanced Scorecard: Step-by-Step, John Wiley & Sons, New York, NY.
Parasuraman, A., Berry, L. and Zeithaml, V. (1991), ‘‘Understanding customer expectations of service’’,
Sloan Management Review, Vol. 39, Spring, pp. 39-48.

Sanchez, P., Chaminade, C. and Olea, M. (2000), ‘‘Management of intangibles – an attempt to build a
theory’’, Journal of Intellectual Capital, Vol. 1 No. 4, pp. 312-27.
Simons, R. (2000), Performance Measurement & Control Systems for Implementing Strategy: Text and
Cases, Prentice-Hall, Englewood Cliffs, NJ.

Tate, D. (2000), ‘‘Issues involved in implementing a balanced business scorecard in an IT service


organization’’, Total Quality Management, Vol. 11 Nos 4/5/6, pp. 674-9.
Van Der Zee, J. and De Jong, B. (1999), ‘‘Alignment is not enough: integrating business and information
technology management with the balanced business scorecard’’, Journal of Management Information
Systems, Vol. 16 No. 2, pp. 137-56.

j j
VOL. 9 NO. 4 2005 MEASURING BUSINESS EXCELLENCE PAGE 45

You might also like