You are on page 1of 35

An Experimental Test of Behavior under Team Production

Donald Vandegrift
The College of New Jersey

Abdullah Yavas
Pennsylvania State University

August 2005

Abstract: This study reports a series of experiments that examine behavior under team production
and a piece rate. In the experiments, participants complete a forecasting task and are rewarded based
on the accuracy of their forecasts. In the piece-rate condition, participants are paid based on their
own performance while the team production condition rewards participants based on the average
performance of the team. Overall, there is no statistically significant difference in performance
between the conditions. However, this result masks important differences in the behavior of men
and women across the conditions. Men in the team production condition compete even though the
payment scheme provides no monetary incentive to compete and they show significantly higher
performance than the men in the piece rate. For women, the results are reversed. Women in the team
production condition show significantly lower performance than the women in the piece rate.
Because men compete, they change their behavior in the team production condition based on
measures of relative performance. The women do not. Forecast errors for the women are explained
only by the measure of basic skill and time spent on the task.

JEL Codes: J33, M12, M52


Keywords: Team Production, Shirking, Experiment, Gender

Acknowledgments: The authors gratefully acknowledge support from the National Science
Foundation (SES-0111789). Joao Neves provided helpful comments. M. Abdullah Sahin and
Nuriddin Ikromov provided valuable assistance. Correspondence can be directed to Donald
Vandegrift, School of Business, The College of New Jersey, 2000 Pennington Rd., Ewing, NJ
08628-0718. e-mail: vandedon@tcnj.edu; fax: (609) 637-5129.
I. Introduction

Under team production, groups rather than individuals are responsible for a set of tasks and

compensation is based on the performance of the group. Theoretical models of team production

suggest that compensating workers based on team performance causes team members to free ride on

the efforts of others (i.e., shirk). If total returns are divided evenly among a team with n members,

team members incur the full cost of their effort while they receive only 1/n of the marginal gains

from the effort. Consequently, effort levels are sub-optimal. While theoretical models suggest that

team production may lower worker productivity, firms often opt to reward workers based on team

performance anyway. For instance, a recent survey finds that 26% of firms used team rewards

(McClurg, 2001).

The empirical literature on team production also suggests that theoretical models of team

production may overstate the costs of shirking. Empirical studies of behavior under team production

often fail to find shirking behavior (van Dijk et al., 2001; Hamilton et al., 2003). Nevertheless, only

a small number of papers test behavior under team production incentives. To further advance

understanding of behavior under team production, we conduct a series of experiments. We compare

behavior under team production and a piece rate payment scheme and suggest an explanation for the

mixed results on shirking in team production.

In the experiments, participants complete a real-effort forecasting task and are rewarded

based on the accuracy of their forecasts. The experimental task is designed to allow measurement of

both individual contributions and team performance.1 Participants were randomly assigned to one of

1
two conditions. In the piece-rate condition, participants are paid based on their own performance

while the team production condition rewards participants based on the average performance of the

team. In each condition, participants produce forecasts for twenty rounds. In each round,

participants receive feedback on their forecast error and earnings. In the team production condition,

participants also receive information on the forecast error of the team.

The results show no statistically significant difference in performance between the

conditions. However, this result masks important differences in the behavior of men and women

across the conditions. The men in the team production condition show significantly higher

performance than the men in the piece rate. For women, the results are reversed. The women in the

team production condition show significantly lower performance than the women in the piece rate.

In addition, men change their behavior in the team production condition based on measures

of relative performance. The women do not. Forecast errors for the women are explained by the

measure of basic skill and time spent on the task. In essence, the men in the team production

condition compete even though the payment scheme provides no monetary incentive to compete.

These results therefore suggest a connection between behavior in team production and recent studies

of gender differences in competitive behavior that find incentives conditioned on relative

performance (i.e., a tournament) raise the performance of men relative to women (Gneezy et al.,

2003; Gneezy and Rustichini, 2004; Vandegrift et al., 2005).

Taken together, the results suggest that environmental cues or a reference frame that allows

for meaningful comparisons with others may be as important in fostering competition as explicit

1
Metering individual contributions will often be difficult or impossible in many real-world settings (Alchian and
Demsetz, 1972; Blair and Stout, 1999). In fact a firm may adopt team production techniques because some inputs have a
higher value in team production than in their next best use and it is difficult to attribute any portion of output to a single
team member. However, a firm might also institute team production incentives simply because the firm wishes to foster
cooperation among workers.

2
monetary incentives to compete. We may also reconcile differences in results across empirical

studies of team production by appealing to differences in environmental cues or reference frame.

Experiments that employ a real-effort task find no evidence of shirking (van Dijk et al., 2001 and

the present study) while experiments that use a procedure designed to mimic effort choices find

evidence of shirking (Nalbantian and Schotter,1997; Meidinger et al., 2003). In fact, team

production experiments that mimic effort choices produce results that more closely resemble

behavior in public goods tasks than behavior in real-effort experiments of behavior under team

production.

In field data, environmental cues and reference frames are harder to detect. However, it is

worth noting that Hamilton et al. (2003) fail to find evidence of shirking in a garment manufacturing

facility where teams of six to seven workers are arrayed in a U-shaped work space of about 12 by 24

feet. Thus, the workers had timely and salient information on the productivity of the group.

II. Background

Shirking and Team Production

In the classic theoretical treatment of team production, Alchian and Demsetz (1972)

consider the case of two men jointly lifting heavy cargo into trucks. If we can observe only the total

weight loaded each day, it is impossible to determine each individual's contribution. Because it is

impossible to identify individual contributions, team members have an incentive to shirk. If there

are n team members, then each team member bears only 1/n of the costs of their shirking. However,

each team member receives the full benefits of their shirking. Thus, each member sets their

marginal benefits equal to 1/n marginal costs.

3
The subsequent theoretical literature uses the shirking result as a starting point and focuses

on contractual solutions to the problem of shirking (Holmstrom, 1982; Rasmusen, 1987; Itoh, 1991;

McAfee and McMillan, 1991; and Legros and Mattthews, 1993).2 However, the evidence on the

importance of shirking in the empirical literature is mixed. Some studies find clear evidence of

shirking (Gaynor and Gertler,1995; Nalbantian and Schotter, 1997; Meidinger et al., 2003) while

others do not (van Dijk et al., 2001; Hamilton et al., 2003).

To analyze behavior under team production, Nalbantian and Schotter (1997) conduct a

controlled experiment. Their procedure mimic effort choices by assuming that each individual has

identical effort costs and that the effort “costs” are generated by a specific function.3 The

experiment fixes group size at six and tests a number of different compensation schemes under team

production. Participants made decisions about effort levels in each of 25 rounds.

Using this design, Nalbantian and Schotter find sub-optimal effort levels when team

members are awarded equal shares of team output (i.e., the standard team production problem). That

is, when revenues are shared, shirking occurs. While the mean effort levels were above the

predicted (shirking) equilibrium, there was a downward trend that converged on the predicted value

(i.e., the Nash equilibrium prediction). Thus, the results were consistent with behavior in public

goods experiments. Participants supplied effort (or contributions to the public good) above the Nash

equilibrium in early rounds but effort fell over time.

2
For instance, Holmstrom (1982) suggests a forcing contract mechanism to resolve the shirking problem. The
forcing contract specifies a performance target for the firm or a group within the firm. The target may be based on
revenue or some other outcome. If the target is met or exceeded, all the workers in the group (or firm) share in the
revenue generated. If the target is not met, each worker is paid a relatively low penalty wage.
3
Nalbantian and Schotter (1997) allowed subjects to select an integer e between zero and 100 (inclusive). Each
number had a corresponding cost. The corresponding cost was generated by the function C(e) = e2/100. After
choosing a number, the experimenters circulated a box of “random numbers” (bingo balls labeled with integers
from –a to +a). The sum of the random number and the decision number produced a total number. Subjects with
higher total numbers received higher fixed payments. Thus, the task facing subjects was to learn the optimal number
to purchase given the cost structure.

4
Meidinger et al. (2003) use a similar task to study interactions between a principal and a

team composed of two agents. The experiment manipulates the productivity of the agents. In one

condition, “effort” choices across the two agents have the same productivity effect, while in the

second condition, the productivity levels vary. The task has two decision stages. In the first stage,

the principal offers the agents a residual return. If both agents accept the offer, the agents choose an

effort level and the gains are distributed among the participants. Meidinger et al. find that under

both conditions agents supply sub-optimal effort levels (i.e., free riding) and that free riding is much

greater when the agents vary in their productivity.

In contrast to Nalbantian and Schotter (1997) and Meidinger et al. (2003), van Dijk et al.

(2001) use a real-effort task and find that participants do not shirk. The task required participants to

search in a two-dimensional space, S = {(H, V): H, V∈ [a, -a], with a an integer}, to find the highest

possible value of a single-peaked function. Search for the peak started at the (0, 0) coordinate and

participants were permitted to raise or lower H and V in discrete steps of one over a fixed time

period. During each time period, the subjects could work on two separate searches (A and B) and

switch between the searches costlessly. Search A is work for the employer while search B is

intended to capture activities valuable only to the worker that may be undertaken on company time.

Consequently, Search A rewards differed across conditions and Search B activities were

always rewarded based on a piece rate. In the team condition, participants were randomly matched

with one other participant and they were paid in search A based on the average performance of the

group. (They received the piece rate for search B.) In the piece rate condition, participants were paid

in both searches (A and B) on the basis of a piece rate. Comparing the piece rate and team

conditions van Dijk et al. (2001) find no statistically significant difference in either the effort or the

performance levels. Moreover, performance in the team condition did not fall over time.

5
Recent analyses of firm and individual-level data also find mixed evidence on the

importance of shirking.4 Gaynor and Gertler (1995) examine the behavior of physicians in

partnership arrangements. Using the number of office visits as the measure of physician effort,

they find that increased revenue sharing among partners reduces the number of office visits. In

contrast, Hamilton et al. (2003) examine the case of a single garment plant that shifted from an

individual piece rate to a group piece rate (i.e., team production). Teams were composed of six to

seven workers and the team’s net receipts were divided equally among team members.

Productivity rose 18% after the introduction of teams. In addition, higher ability workers joined

teams at a higher rate and this accounted for about one fifth of the productivity increase.

Hamilton et al. (2003) contend that there are two basic ways to explain the attenuation (or

elimination) of the free rider problem. First, the problem may be reduced through effective

monitoring and punishing of free riders. Such punishments may be administered through explicit

threats to discontinue cooperation or through peer pressure. Threats to discontinue cooperation

require that discounted losses from lost cooperation exceed the one-shot benefits of shirking.

Peer pressure reduces the free rider problem because departures from team norms reduce

individual utility. Second, synergies related to team production imply that that team productivity

is more than the simple sum of the performance of individual team members. The opportunity to

collaborate draws on new skills. These skills may improve coordination as well as allow team

members to discover methods to assign, organize, and redesign tasks.

In addition, Hamilton et al. (2003) find that teams with more heterogeneity in worker

ability show better performance. They suggest that greater heterogeneity may cause better

4
A separate empirical literature analyzes worker productivity under profit sharing plans (Hansen, 1997; Weitzman
and Kruse, 1990). However, the baseline for determining improvements is a reward structure in which rewards do
not depend on productivity.

6
performance for two reasons. First, more skillful workers may be able to teach the less skillful

how to execute tasks more efficiently. High-ability workers raise the productivity of low-ability

workers. Second, bargaining over the common work pace will produce a difference result when

there is wider variation in intra-team worker ability. Bargaining over work pace occurs because

high-ability workers may threaten to opt out. Such threats are credible because high-ability

workers have the best outside options. To retain the high-ability worker, the rest of the team may

accept a faster work pace.

Relation Between Team Production and Public Goods Experiments

Nalbantian and Schotter (1997) note that the structure of team production and public goods

experiments is similar.5 In each case, costs are borne individually while group output is shared

equally. The typical public goods experiment gives each participant a sum of money. The

participant has the option of contributing some portion of the sum to a common pool. The total

contributions to the pool are multiplied by a factor greater than one and returned to the subjects in

equal shares.

Experiments that require the completion of a real-effort task differ from public goods

experiments (and Nalbantian and Schotter, 1997) in two key respects.6 First, real-effort experiments

allow differences in ability to arise endogenously. While public goods experiments generally show

that asymmetries in payoffs (not ability) reduce cooperative behavior,7 an individual's pride in

5
The large literature on public goods experiments is summarized in Ledyard (1995).
6
Nalbantian and Schotter (1997) note two differences are differences between public goods experiments and their team
production experiment. First, in contrast to public goods experiments, group output under team production contains a
random component. Various exogenous factors (e.g. changes in market demand) imply a probabalistic relation between
effort and output. Second, the compensation schemes offered under team production have no analogue in public goods
theory. Another key difference is that team production typically requires participants to contribute effort while public
goods situations require monetary contributions.
7
See Bagnoli and McKee (1991); Fisher et al. (1995).

7
his/her talent or skill may be a significant deterrent to shirking under team production. Second, real-

effort experiments more closely resemble a typical workplace interaction. In a typical workplace

interaction, individuals may be uncertain about whether poor performance by team members is the

result of low ability or shirking.

Gender and Behavior under Differing Labor Compensation Schemes

Although the public goods literature has devoted some attention to differences between men

and women8, there is relatively little on gender differences in behavior under various labor

contracts. The central result is that men respond more strongly to competitive incentives than

women (Gneezy et al., 2003; Gneezy and Rustichini, 2004; Vandegrift et al., 2005). Gneezy et al.

(2003) report an experiment in which participants solve computerized maze problems. When

payment is based on the absolute number of computerized mazes solved (i.e., a piece rate), they find

no difference in performance between men and women. However, when men and women are paid

based on tournament incentives, the performance of men increases while the performance of women

remains the same as in the piece rate.

Gneezy and Rustichini (2004) find a similar result in a field experiment with elementary

school students. In the experiment, students ran a 40-yard dash both alone and in pairs. In the first

round, all students ran alone. In second round, some students ran against competitors while others

ran alone. Overall, boys matched against competitors showed a significant improvement in the

second round but the girls did not. When girls competed against girls in the second round, their

times were slower. When boys competed against boys in the second round, their times were faster.

8
Ledyard (1995) notes that in public goods experiments the evidence on gender differences in contribution rates is
mixed.

8
While the girls showed a small improvement in the mixed gender races, the improvement was far

larger for boys.

Using Gneezy et al. (2003) as a starting point, Vandegrift et al. (2005) examine choices and

behavior when agents are able to choose between a payment scheme that rewards based on absolute

performance (i.e., piece rate) and a scheme that rewards based on relative performance (i.e., a

tournament). The structure of the rewards in the tournament option varied across conditions, the

piece rate payoffs remained the same. In one condition (winner-take-all), only the most accurate

forecaster who chose the tournament for each round received a payment. In the other condition

(graduated tournament condition), the same payment was divided among the first, second, and third

finishers who chose the tournament. Men in the winner-take-all condition showed significantly

greater forecasting accuracy than men in the graduated tournament condition. Women showed no

statistically significant difference in forecasting accuracy between winner-take-all and graduated

tournament conditions.

III. Experimental Design

To test behavior under team production, we design an experiment that allows participants to

contribute real effort towards team output. In one condition, we compensate team members based

on team performance. If total returns are divided evenly among the team members, R indicates

returns, and ei indicates costly effort, we may express the individual team member's maximization

problem as:

(1) Max G = ∑ Ri (ei) / n - C(ei)

9
This implies that as n rises, the returns to effort fall while the costs remain unchanged.

Consequently, the team members will choose lower effort levels and output will fall. In the other

condition, participants completed the same task but were paid based only on their own performance.

We conducted the experiments using students at The Pennsylvania State University as

participants. A total of 84 students participated. Each of the two experimental conditions had 42

participants divided among three separate sessions. All sessions were conducted at the LEMA lab at

The Pennsylvania State University. Participants completed a computer-based forecasting task

known as a multiple-cue-probability-learning (MCPL) task.9

For each of 20 periods, participants were asked to forecast the price of a fictitious “stock”

using two exogenous “cues”. Each period, the values of the cues changed, but the relationship

between the cues and the price of the stock remained the same throughout the experiment and across

both experimental conditions. Because the relationship was unknown to all participants, they had to

discover it from the exogenous cues. Ten examples of the cue-price relationship were provided to

each participant. Participants examined the examples prior to making their forecasts. Following

review of the ten examples, participants produced three practice forecasts based on three new sets of

cue values.

Following these practice rounds, the experiment began and participants received the first of

20 sets of cues to make their forecast. Accurate forecasts under such conditions require participants

to detect the covariation between the cues and the stock price (Goldstein and Hogarth, 1997).

Unknown to all participants, the price of the stock was determined by the relationship:

(2) Price = 85 + 0.3 * Cue 1 + 0.7 * Cue 2 + e

9
See Balzer et al. (1992) and Goldstein and Hogarth (1997) for reviews of research using MCPL tasks by
psychologists. See Schmalensee (1976), Bolle (1988), Brown (1995, 1998), Vandegrift and Brown (2003), and
Vandegrift and Brown (2005) for examples of the use of MCPL tasks by economists.

10
where e is a uniformly distributed discrete random variable on the interval (-3, 3). The cue values

ranged from 101 to 393 and the subsequent prices ranged from 230 to 424.

Experimental Conditions

In one condition, participants were paid based on a piece rate. The piece rate paid

participants based on their absolute forecasting error. Participants with more accurate forecasts

received higher payments. The payment to the individual participants in the piece rate condition was

determined by:

(3) piece rate = $1.70 – (.03 * forecast error participant i).

In the second condition, participants receive one seventh of the total group output where individual

contributions are determined by the piece rate in equation (3) above.


7

∑ ($1.70 − (.03 * forecast error))


i =1
i
(4) team production rate =
7

The amounts were added across the rounds and paid to the participants at the end of the experiment.

Table 1 summarizes the experimental conditions.

Procedure

After the participants entered the lab, they were randomly assigned a seat in front of a

computer and were given a set of instructions describing the forecasting task. The instructions

described the nature of the forecasting task (i.e., forecast the price of a fictitious stock using

exogenous cues for 20 rounds), that the values of the cues changed each round but their relationship

to the stock price remained constant throughout the experiment, and that all participants would see

11
the identical cue values each round. The instructions also explained that an initial endowment of $5

had been placed in each participant’s “Earnings Account.” Earnings from the experiment were

added to the earnings account and the participants received a payment in cash at the conclusion.

After answering any remaining questions, the participants were told they would have five

minutes to examine ten examples of the cue-price relation. Each of the ten examples as well as the

twenty rounds that followed reflected the same underlying relationship (reflected in equation (1)

above). At the end of the 5-minute period, the participants completed 3 practice rounds. In the

practice rounds, participants received two cue values and submitted their forecast. Each round the

participants received feedback on their forecast error and the actual price of the stock. Participants

were not paid for the practice rounds. The payment scheme was explained following the practice

rounds and participants were shown the round one cue value(s) and given two minutes to enter their

forecasts into the computer. Once all participants had entered their forecasts, a computer program

calculated each participant’s forecast error and actual earnings.10

In each condition, participants received information in each round on: (1) the actual price of

the stock; (2) the participant’s forecast; (3) the participant’s forecast error; (4) the participant’s

earnings. In addition, participants in the team production condition also received information each

round on (5) the average forecast error for the group. The participants were encouraged to record

any relevant information on a sheet of paper and were able at any time to recall the information

from previous rounds.

After giving the participants one minute to examine their results, the cue values for the next

period were then shown to each participant. This process was repeated for 20 rounds. The

10
The program was written by M. Abdullah Sahin utilizing the Z-tree. Copies of the program as well as the data are
available upon request. All instructions are available at: http://vandegrift.intrasun.tcnj.edu

12
experiment ran for about 1 hour. Throughout the experiment all information was private, including

participant forecasts. At the end of each session, the participants completed a post-experiment

questionnaire and were paid their total earnings (initial endowment plus the sum of earnings from

the 20 tournaments).

Payoffs could range from $5 to $39 for in either the piece rate or the team production

conditions (including the $5 initial endowment). Actual payoffs varied from $7.71 to $34.41 in the

piece rate condition and $23.41 to $29.32 in the team production condition (including the $5 show-

up payment). The average payoff across all conditions was $26.27. In the piece rate condition the

average payoff was $26.67 while in the team production condition, the average payoff was $25.88.

Of the 84 participants, 59% were men. The proportion of men was slightly lower in the team

production condition than in the piece rate condition (55% v. 64%).

IV. Results

Individual Behavior

Table 2 reports means and standard deviations at the observation level for forecast errors for

rounds 1-20, 1-10, and 11-20. Higher forecast errors indicate lower performance. For each time

period, the means and standard deviations are reported by gender and condition. Men had average

forecast errors about two points lower (about 8%) than women across both conditions. Participants

in the piece rate condition had forecast errors that were only one point lower (about 4%) than the

team production condition. Looking at the performance of men and women across the two

conditions, the differences are striking. In the piece rate condition, the women had much lower

forecast errors than the men – about 4.5 points or about 18%. In the team production condition, the

13
situation was reversed. The men had much lower forecast errors than the women – about 8.2 points

or about 28%.

Figure 1 shows the forecast errors across all twenty rounds for the piece rate and the team

production conditions. Interestingly, there is little difference in forecast errors across the two

conditions and participants in the team production condition do not decrease effort over time. This

stands in marked contrast to behavior in public goods experiments (Ledyard, 1995). Figure 2

compares men and women in the piece-rate condition across all 20 rounds. In nearly every round,

women outperform the men. Figure 3 compares men and women in the team production condition

across all 20 rounds. In nearly every round, men outperform the women.

To investigate more systematically the link between gender, team production incentives, and

forecasting error, we run random-effects generalized least squares regressions with forecast errors

for each round as the dependent variable. We use a unique participant-specific id to control for

individual fixed effects.11 The regressions also control for the payoff structure, gender, and

participant skill. We control for skill in two ways: the average per-round forecast error by

participant for the three practice rounds (Practice Average) and the forecast error for each

participant in round t-1 (Lagged Error).

The results are reported in Table 3. Column 1 reports the regression on the entire data set.

There are no statistically significant differences in forecast errors for men and women nor is there

any statistically significant difference in forecast errors between the piece rate and the team

production conditions. The controls for ability (Practice Average and Lagged Error) are both

positive and significant indicating that higher average errors in the practice rounds and higher

11
Computing the average forecast error for each participant across the 20 rounds and running simple OLS
regressions does not change the basic results.

14
forecast errors in round t-1 raise forecast errors in round t of the experiment. The round coefficient

is negative and significant indicating that errors fall over time.12 The insignificant result on the

Team coefficient directly violates the standard assumption of the theoretical literature on team

production. In general, participants do not reduce effort/performance in the team production

condition compared to the piece rate condition.

To further investigate the causes of the stronger than expected performance in the team

production condition, we run separate random-effects regressions for the piece rate and team

production conditions with gender as a covariate. In addition, we run random effects regressions

that separate the men from the women. These regression results appear in Table 3 as columns 2

through 5. The results show that the women reduce performance under team production. Forecast

errors for women in the team production condition are about 30% higher than they are in the piece

rate. Consequently, women behave in a manner consistent with the standard predictions of

economic theory. Men, on the other hand, increase their performance in the team production

condition. Forecast errors for men in the team production condition are about 14% lower than they

are in the piece rate.

We may see the same basic results by running separate random effects regressions for the

piece rate and team production conditions. In the piece rate, men have lower performance than the

women. Forecast errors for men in the piece rate condition are about 17% higher than they are for

the women in the piece rate condition. Forecast errors for men in the team production condition are

about 24% lower than the women in the team production condition. The average forecast error by

participant in the practice rounds is significant across all specifications but it is generally a stronger

12
To ensure that behavior stabilized over time, we recalculated each regression in Tables 3,4,6, and 7 using only the
last 10 rounds of forecasts. For every regression, the coefficient for round was small and statistically insignificant.

15
predictor for the forecast errors of the women. In contrast, forecast error in the previous round

predicts performance better for the men than the women. Comparing the regression equations for

the piece rate and the team production conditions, we see that the magnitude of the round effect is

essentially the same in both equations. On average, forecast errors are about a quarter of a point

lower in round t compared to round t-1. This suggests that effort/cooperation levels do not

deteriorate as they do in public goods games.

The results also suggest that the men are adjusting their effort based on their performance in

the prior round while the women’s performance is a function of their skill level and the number of

elapsed rounds. This is consistent with Gneezy et al. (2003), Gneezy and Rustichini (2004), and

Vandegrift et. al (2005). If men have a stronger desire to compete, information on their relative

position in the last round should predict effort levels and performance. Lagged error is highly

correlated with relative position in the last round. To test this hypothesis more directly, we create

two new variables to capture the information that participants in the team production condition

receive each round.

As noted above, participants in the team production condition receive information on

average forecast error for the group in round t-1 before making their forecast in round t.

Consequently, team production participants can infer their relative position in the team. To measure

this relative position we calculate: 1) the forecast error rank in team where 1=most accurate and

7=least accurate for participant i in round t-1 (Lagged Rank); and 2) a dummy variable that

equals 1 if forecast error for participant i is less than average team forecast error in round t-1

(Lagged Rankdum).

The random-effects regression results for the team production condition are reported in

Table 4. Elapsed time on the task (round) and average forecast errors in the practice rounds

16
explain the forecast errors for the women. For men, elapsed time on the task is insignificant and

the effect of average forecast error in the practice rounds is much smaller. Instead, the men

respond to the indicators of relative position. Rank in the last round (Lagged Rank) is a strong

predictor of forecast errors for men while it is insignificant for women. A one-integer increase in

rank in round t-1 raises forecast errors for men in the team production condition by 1.5 points. In

addition, men that have forecast errors below the mean for their team in round t-1 (Lagged

Rankdum), have forecast errors that are on average 4.3 points lower. Interestingly, the men focus

on relative position even though relative position does not influence their rewards.

While the number of teams is small (n = 6), it is possible to draw some tentative

conclusions. The central result is that teams with a higher standard deviation in ability (holding

average ability in the team constant) have lower forecast errors. This replicates one of main results

in Hamilton et al. (2003) under very different conditions. As above, we measure participant ability

by the average forecast error in the three practice rounds (forecasting trials prior to the experiment).

We average the individual observations across each team. To measure variation in ability for each

team, we compute the standard deviation of the average practice round forecast errors. Table 5

reports these basic measures across teams: average error in rounds 1-20, average error rounds 11-20

and forecast error in the practice rounds, standard deviation for each team of the practice round

averages for each individual team member, and the proportion of males to total team members.

Table 6 reports a regression on team average forecast error. Unfortunately, fixed effects

regressions cannot be calculated because the independent variables do not change across rounds.

Because t = 20 and n = 6, we violate one of the assumptions of the random-effects procedure

(i.e., n > t). Consequently, we average the team forecast errors across rounds 1 – 4, 5 – 8, 9 – 12, 13

– 16, and 17 – 20. By creating 5 time periods for each of the 6 teams, we maximize the number of

17
observations and still meet the requirements for random effects. Table 6 shows the results of the

regressions on team average forecast error.

Not surprisingly, teams with higher average forecast errors in the practice rounds had higher

forecast errors over rounds 1-20. A one-point increase in average team forecast error in the trial

period implies a 0.68 increase in the average team forecast error over rounds 1-20. More

interestingly, an increase in the standard deviation in ability across team members implies lower

forecast errors (holding average ability of the team members constant). A one-point increase in

standard deviation of average forecast errors across team members (Team Practice Deviation) in the

trial period causes a 0.62 decrease in the team average forecast error over rounds 1-20. This

suggests that more teams with more diversity in ability, holding average ability in the team constant,

will perform better. The effect of the male ratio is small and statistically insignificant.

To get a picture of team dynamics, we test whether the standard deviation of forecast errors

across team members in the prior round and the average forecast error for the team in the prior

round impact the team average forecast error in the subsequent round. The results are displayed in

Table 7. Standard deviation of team forecast errors in the prior round does not predict average

forecast error for the team in the subsequent round. Apparently, weaker forecasters (as measured by

the team standard deviation of forecast errors in the practice rounds) work harder to improve in

rounds 1-20 and this eliminates any link between the team standard deviation and team average

forecast errors. Consequently, there is a negative relation between practice round standard deviation

in forecast errors across team members and forecast errors over rounds 1-20 and no relation once

the individual participants are grouped into teams and then paid based on their team performance. It

must be that the weaker forecasters increase performance rather than the stronger forecasters

18
decrease performance because increases in average forecast errors across team members (Team

Practice Deviation) implies lower forecast errors.

Team average forecast errors in round t-1 are negatively related to team average forecast

errors in the subsequent round. A one-point decrease in team average forecast error in round t-1

implies a 0.31 increase in average team forecast error in round t. Apparently, the team reacts to poor

performance by working harder. Strong performance in the prior round causes participants to reduce

effort. This likely explains why lagged error in the individual-level data does not explain forecast

errors.

Comparing columns 5 and 6 in Table 3, we see that lagged error has no effect in the team

production condition. In the piece-rate condition, lower forecast errors in the prior round imply

lower forecast errors in the subsequent round. This likely picks up the skill of the individual

forecaster. In the team production condition, participants also change effort levels in response to

team performance. Consequently, we are unable to identify a relation between forecast error in the

prior round and the subsequent round in the team production condition.

V. Conclusion

Team production incentives are commonly employed in business firms yet the behavior of

employees under such incentives is not well understood. To advance our understanding of behavior

under team production, we conduct a controlled experiment with two experimental conditions. In

the piece rate condition, participants were paid based on the absolute size of their forecasting error

in a simple forecasting task. In the team production condition, participants were assigned to groups

of seven members and paid based on the average performance of the group.

19
While the theoretical literature on team production assigns great weight to the problem of

shirking, the empirical literature often fails to detect it. In a recent analysis of behavior under team

production, Hamilton et al. (2003) fail to detect shirking. They suggest that the shirking problem

may be reduced through effective monitoring and punishing of free riders (e.g., explicit threats to

discontinue cooperation, peer pressure) and synergies related to team production. Such synergies

imply that team productivity is more than the simple sum of the performance of individual team

members.

Like Hamilton et al., we find no evidence of shirking when we compare performance in a

piece rate with team production. However, the design of our experiments suggests that factors other

that monitoring and synergies are at work. Because participants in our experiment could not

communicate and the task allowed for no complementarities across participants, it is not possible to

explain our results by appealing to synergies. It is also unlikely that monitoring explains our results.

Low performers could not be identified and the experiment provided no mechanism for making

threats or peer pressure. While it is possible that participants might withhold effort to induce

cooperation, there is no evidence that they did so. Indeed, our results show that teams with weak

performance in the current round increase performance in the subsequent round.

Instead, our evidence suggests that we fail to detect shirking in a comparison of performance

in the team production and piece rate conditions because men in the team production condition

compete. The men compete even though the team production payments provide no incentive to

compete. Comparing the performance of men across conditions, men in the team production

condition show significantly higher performance than the men in the piece rate. For women, the

results are reversed. The women in the team production condition show significantly lower

performance than the women in the piece rate. Because the men compete, they change their

20
behavior in the team production condition based on measures of relative performance. The women

do not. Forecast errors for the women are explained by the measure of basic skill and time spent on

the task.

Hamilton et al. (2003) also find that teams with more heterogeneity in worker ability show

better performance. They suggest that greater heterogeneity may cause better performance because

more skillful workers teach the less skillful how to execute tasks more efficiently and bargaining

over the common work pace will produce a different result when there is wider variation in intra-

team worker ability. We also find some evidence that, holding average skill level of the team

constant, teams with a larger variation in skill levels have lower forecast errors. However, the

design of our experiments suggests that factors other than teaching/learning and bargaining are at

work.

Because participants in our experiment had no outside option, there can be no threats to

exercise an outside option. To the extent that there is higher performance among more

heterogeneous teams in our experiment, teaching and threats to exercise an outside option are not

the cause. Because participants in our experiments completed the same task, synergies are not

possible. To the extent that there is higher performance under team production in our experiment,

synergies are not the cause. We propose instead that larger differences in performance among team

members provide clearer signals of relative performance and unambiguous signals provoke more

effort. In sum, our results suggest that environmental cues or a reference frame that allows for

meaningful comparisons with others may be the key determinant in whether shirking behavior

emerges.

21
References

Alchian, A. and Demsetz, H., 1972, “Production, Information Cost, and Economic Organization,”
American Economic Review, 62, 777-795.

Bagnoli, M., and McKee, M., 1991, “Voluntary Contribution Games: Efficient Private Provision of
Public Goods,” Economic Inquiry, 29, 351-366.

Balzer, W., Sulsky, L., Hammer, L. and Summer, K., 1992, “Task Information, Cognitive
Information, or Functional Validity Information: Which Components of Cognitive Feedback Affect
Performance?” Organizational Behavior and Human Decision Processes, 53, 35-54.

Blair, Margaret M. and Stout, Lynn A., 1999, “Team Production in Business Organizations: An
Introduction,” Journal of Corporation Law, 24, 743-750.

Bolle, F., 1988, “Learning to Make Good Predictions in Time Series,” in Bounded Rational
Behavior in Experimental Games and Markets, R. Teitz, W. Albers and R. Selton, eds. (Springer-
Verlag, Berlin).

Brown, P. M., 1995, “Learning From Experience, Reference Points and Decision Costs,” Journal of
Economic Behavior and Organization, 27, 381-399.

Brown, P. M., 1998, “Experimental Evidence on the Importance of Competing for Profits on
Forecasting Accuracy,” Journal of Economic Behavior and Organization, 33, 259-269.

Fisher, J., Isaac, R., Schatzberg, J., and Walker, J., 1995, “Heterogeneous Demand for Public
Goods: Behavior in the Voluntary Contributions Mechanism,” Public Choice, 85, 249-266.

Gaynor, M. and Gertler, P., 1995, “Moral Hazard and Risk Spreading in Partnerships,” Rand
Journal of Economics, 26, 591-613.

Gneezy, U., Niederle, M., and Rustichini, A., 2003, “Performance in Competitive Environments:
Gender Differences,” Quarterly Journal of Economics, 118, 1049-1074.

Gneezy, U. and Rustichini, A., 2004, “Gender and Competition at a Young Age,” American
Economic Review, 94, 377-381.

Goldstein, W. and Hogarth, R., 1997, “Judgement and Decision Research: Some Historical
Context,” in Research on Judgment and Decision Making: Currents, Connections and
Controversies, William Goldstein and Robin Hogarth, eds. (Cambridge University Press,
Cambridge).

22
Hamilton. B. H., Nickerson, J. A. and Owan, H., 2003, “Team Incentives and Worker
Heterogeneity: An Empirical Analysis of the Impact of Teams on Productivity and Participation,”
Journal of Political Economy, 111, 465-497.

Hansen, D. G., 1997, “Worker Performance and Group Incentives: A Case Study,” Industrial and
Labor Relations Review 51, 37-49.

Holmstrom, Bengt, 1982, “Moral Hazard in Teams,” Bell Journal of Economics, 13, 324-340.

Itoh, Hideshi, 1991, “Incentives to Help in Multi-agent Situations,” Econometrica, 59, 611-636.

Ledyard, John O., 1995, “Public Goods: A Survey of Experimental Research,” in The Handbook of
Experimental Economics, John Kagel and Alvin Roth, eds. (Princeton University Press: Princeton,
NJ).

Legros, Patrick, and Matthews, Steven A., 1993, “Efficient and Nearly Efficient Partnerships,”
Review of Economic Studies, 60, 599-611.

McAfee, R. Preston, and McMillan, John, 1991, “Optimal Contracts for Teams,” International
Economic Review, 32, 561-577.

McClurg, Lucy N., 2001, “Team Rewards: How Far Have We Come?” Human Resource
Management, 40, 73-86.

Meidinger, C., Rullière, J., and Villeval, M., 2003, “Does Team-Based Compensation Give Rise to
Problems When Agents Vary in Their Ability?” Experimental Economics, 6, 253-272.

Nalbantian, Haig, and Schotter, Andrew, 1997, “Productivity Under Group Incentives: An
Experimental Study,” American Economic Review, 87, 314-341.

Rasmusen, Eric, 1987, “Moral Hazard in Risk Averse Teams,” Rand Journal of Economics, 18,
428-435.

Schmallensee, R., 1976, “An Experimental Study of Expectation Formation,” Econometrica, 44,
17-41.

van Dijk, F., Sonnemans, J., and van Winden, F., 2001, “Incentive Systems in a Real Effort
Experiment,” European Economic Review, 45, 187-214.

Vandegrift, D., and Brown, P. M., 2003, “Task Difficulty, Incentive Effects, and the Selection of
High-Variance Strategies: An Experimental Examination of Tournament Behavior,” Labour
Economics, 10, 481-497.

Vandegrift, D., and Brown, P. M., 2005, “Gender Differences in the Use of High-Variance
Strategies in Tournament Competition,” Journal of Socio-Economics (forthcoming).

23
Vandegrift, D., Yavas, A., and Brown, P. M., 2005, “Men, Women and Competition: An
Experimental Test of Behavior,” unpublished m.s.

Weitzman, M. L. and Kruse, D. L., 1990, “Profit Sharing and Productivity,” in Paying for
Productivity: A Look at the Evidence, A. Blinder ed. (Brookings Institution, Washington, DC).

24
Table 1. Summary of experimental conditions

Equation Determining Price:

price = 85 + 0.3 * Cue 1 + 0.7 * Cue 2 + e

Condition 1 – piece rate


Condition 2 – team production

Payoffs in the piece rate

payoff = $1.70 – (.03 * forecast error).

Payoffs in team production


7

∑ ($1.70 − (.03 * forecast error))


i =1
i
payoff =
7

25
Table 2. Means and Standard Deviations by Condition

Forecast Error a Forecast Error Forecast Error


Rounds 1-20 Rounds 1-10 Rounds 11-20

Overall 24.13 25.64 22.63


(26.57) (28.12) (24.86)

Piece Rate 23.6 25.34 21.86


(27.57) (28.47) (26.56)

Team Production 24.67 25.94 23.40


(25.54) (27.79) (23.03)

Men 23.26 24.28 22.24


(27.01) (27.15) (26.84)

Women 25.42 27.64 23.20


(25.89) (29.41) (21.64)

Men Piece Rate 25.21 27.09 23.34


(30.19) (30.56) (29.75)

Women Piece Rate 20.69 22.18 19.20


(21.85) (24.05) (19.36)

Men Team Production 20.97 20.98 20.96


(22.52) (22.12) (22.96)

Women Team Production 29.15 31.95 26.35


(28.16) (32.45) (22.84)

Standard deviations in parentheses.


a
Forecast Error: average per-round absolute forecast error by participant│Pt - Peit│rounds 1-20.
Pt = the price of the stock in period t. Peit = participant i’s forecast in period t.

26
Table 3. Random-Effects Generalized Least Squares Regressions on Individual Forecast Errors

Dependent Forecast Forecast Forecast Forecast Forecast


Variable: Error a Error Error Error Error
Men Women Piece Rate Team
Only Only Only Only
Constant 19.26*** 19.03*** 17.97*** 14.11*** 25.37***
(2.06) (2.45) (2.77) (2.89) (2.65)

Maleb -1.24 3.48* -7.02***


(1.32) (1.96) (1.77)

Team c 0.145 -3.61** 6.21***


(1.31) (1.72) (1.99)

Practice Averaged 0.156*** 0.104*** 0.237*** 0.149*** 0.163***


(0.028) (0.037) (0.044) (0.051) (0.033)

Lagged Error e 0.156*** 0.225*** 0.008 0.245*** 0.028


(0.025) (0.033) (0.039) (0.036) (0.035)

Round f -0.253** -0.205 -0.364** -0.231 -0.277*


(0.118) (0.156) (0.177) (0.172) (0.159)

R2 within 0.01 0.01 0.01 0.02 0.01


2
R between 0.45 0.65 0.42 0.67 0.37
R2 overall 0.06 0.07 0.08 0.09 0.06
N 1596 950 646 798 798

Standard errors in parentheses.


* = significant at the 0.1 level, ** = significant at the 0.05 level, *** = significant at the 0.01 level.
Group variable: participant
a
Forecast Error: per-round absolute forecast error by participant│Pt - Peit│rounds 1-20.
Pt = the price of the stock in period t. Peit = participant i’s forecast in period t.
b
Male: dummy variable = 0 if female, 1 if male.
c
Team: dummy variable = 0 if piece rate, 1 if team production.
d
Practice Average: the average per-round forecast error for the practice rounds for participant i.
e
Lagged Error: forecast error by participant in round t-1.
f
Round: indicates round number (1-20).

27
Table 4. Random-Effects Generalized Least Squares Regressions on Individual Forecast Errors –
Team Production Condition

Dependent
Variable: Forecast Error a Forecast Error Forecast Error Forecast Error
Men Only Women Only Men Only Women Only
Constant 13.75*** 26.41*** 21.95*** 22.94***
(3.23) (4.55) (3.16) (4.24)

Lagged Rankb 1.52*** -0.202


(0.532) (0.713)

Lagged Rankdumc -4.34** 3.90


(2.28) (2.86)

Practice Averaged 0.105*** 0.240*** 0.110*** 0.250***


(0.039) (0.055) (0.040) (0.056)

Round e -0.138 -0.476* -0.133 -0.461*


(0.194) (0.257) (0.195) (0.256)

R2 within 0.01 0.01 0.01 0.03


R2 between 0.37 0.37 0.28 0.31
2
R overall 0.04 0.06 0.03 0.06
N 437 361 437 361

Standard errors in parentheses.


* = significant at the 0.1 level, ** = significant at the 0.05 level, *** = significant at the 0.01 level.
Group variable: participant
a
Forecast Error: per-round absolute forecast error by participant│Pt - Peit│rounds 1-20.
Pt = the price of the stock in period t. Peit = participant i’s forecast in period t.
b
Lagged Rank: forecast error rank in team (1=most accurate, 7=least accurate) for participant i in
round t-1.
c
Lagged Rankdum: dummy variable = 1 if forecast error for participant i is less than average
team forecast error in round t-1.
d
Practice Average: the average per-round forecast error for the practice rounds for participant i.
e
Round: indicates round number (1-20).

28
Table 5. Means and Standard Deviations by Team – Team Production Condition

Average Error Average Error Team Team Practice Team Male


Rounds1-20 a Rounds11-20 b Practice c Deviation d Ratio e

Team 1 30.88 26.39 56.85 39.79 0.2857


Team 2 27.60 26.28 20.90 7.28 0.5714
Team 3 24.60 22.33 35.52 21.83 0.7142
Team 4 23.79 24.34 40.42 32.39 0.7142
Team 5 19.39 15.85 22.19 19.76 0.7142
Team 6 22.70 23.29 27.47 17.08 0.2857

a
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│rounds 1-20.
Pt = the price of the stock in period t. Peit = participant i’s forecast in period t.
b
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│rounds 11-
20.
c
Team Practice: the average per-round forecast error for the practice rounds for team j.
d
Team Practice Deviation: the standard deviation for team j of the practice round averages for
each individual team member i.
e
Team Male Ratio: the proportion of males to total team members for team j.

29
Table 6. Random-Effects Generalized Least Squares Regressions on Average Team Forecasting
Error

Average
Dependent Variable: Forecast Error a
Rounds 1-20
Constant 20.49***
(5.81)

Team Practice b 0.680***


(0.238)

Team Practice
Deviation c -0.624**
(0.266)

Team Male Ratio d -0.051


(5.84)

Quintile Round e -1.44***


(0.568)

R2 within 0.22
2
R between 0.86
2
R overall 0.48
N 30

* = significant at the 0.1 level, ** = significant at the 0.05 level, *** = significant at the 0.01 level.
a
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│averaged
over rounds 1 – 4, 5 – 8, 9 – 12, 13 – 16, & 17 – 20 (i.e., quintile).
b
Team Practice: the average per-round forecast error for the practice rounds for team j.
c
Team Practice Deviation: the standard deviation for team j of the practice round averages for
each individual team member i.
d
Team Male Ratio: the proportion of males to total team members for team j.
e
Quintile Round: 1 = rounds 1 – 4, 2 = rounds 5 – 8, etc.

30
Table 7. Fixed-Effects Regression on Team Average Forecasting Error

Dependent Variable: Team Average


Forecast Error a
Constant 32.53***
(3.75)

Lagged Team Deviationb 0.190


(0.139)

Lagged Team Error c -0.312**


(0.151)

Round d -0.382**
(0.193)

R2 within 0.06
2
R between 0.83
R2 overall 0.03
N 114

Standard errors in parentheses.


** = significant at the 0.05 level, *** = significant at the 0.01 level.
Group variable: team
a
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│rounds 1-20.
Pt = the price of the stock in period t. Peit = participant i’s forecast in period t.
b
Lagged Team Deviation: standard deviation of forecast errors across participants by team for
round t-1.
c
Lagged Team Error: average forecast errors across participants by team for round t-1.
d
Round: indicates round number (1-20).

31
Figure 1.

Average Forecast Error by Condition

60

50

40
Average Forecast Error

Piece rate
30
Team Production

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Round

32
Figure 2.

Average Forecast Errors in the Piece Rate Condition by Gender

60

50

40
Average Forecast Error

Men
30
Women

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Round

33
Figure 3.

Average Forecast Errors in the Team Production Condition by Gender

50

45

40

35
Average Forecast Error

30

Men
25
Women

20

15

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Round

34

You might also like