Vandegrift Yavas, An Experimental Test of Performance Under Team Production

An Experimental Test of Behavior under Team Production
Donald Vandegrift
The College of New Jersey
Abdullah Yavas
Pennsylvania State University
August 2005
Abstract: This study reports a series of experiments that examine behavior under team production
and a piece rate. In the experiments, participants complete a forecasting task and are rewarded based
on the accuracy of their forecasts. In the piece-rate condition, participants are paid based on their
own performance while the team production condition rewards participants based on the average
performance of the team. Overall, there is no statistically significant difference in performance
between the conditions. However, this result masks important differences in the behavior of men
and women across the conditions. Men in the team production condition compete even though the
payment scheme provides no monetary incentive to compete and they show significantly higher
performance than the men in the piece rate. For women, the results are reversed. Women in the team
production condition show significantly lower performance than the women in the piece rate.
Because men compete, they change their behavior in the team production condition based on
measures of relative performance. The women do not. Forecast errors for the women are explained
only by the measure of basic skill and time spent on the task.
JEL Codes: J33, M12, M52

Keywords: Team Production, Shirking, Experiment, Gender
Acknowledgments: The authors gratefully acknowledge support from the National Science
Foundation (SES-0111789). Joao Neves provided helpful comments. M. Abdullah Sahin and
Nuriddin Ikromov provided valuable assistance. Correspondence can be directed to Donald
Vandegrift, School of Business, The College of New Jersey, 2000 Pennington Rd., Ewing, NJ
08628-0718. e-mail: vandedon@tcnj.edu; fax: (609) 637-5129.
I. Introduction
Under team production, groups rather than individuals are responsible for a set of tasks and
compensation is based on the performance of the group. Theoretical models of team production
suggest that compensating workers based on team performance causes team members to free ride on
the efforts of others (i.e., shirk). If total returns are divided evenly among a team with n members,
team members incur the full cost of their effort while they receive only 1/n of the marginal gains
from the effort. Consequently, effort levels are sub-optimal. While theoretical models suggest that
team production may lower worker productivity, firms often opt to reward workers based on team
performance anyway. For instance, a recent survey finds that 26% of firms used team rewards
(McClurg, 2001).
The empirical literature on team production also suggests that theoretical models of team
production may overstate the costs of shirking. Empirical studies of behavior under team production
often fail to find shirking behavior (van Dijk et al., 2001; Hamilton et al., 2003). Nevertheless, only
a small number of papers test behavior under team production incentives. To further advance
understanding of behavior under team production, we conduct a series of experiments. We compare
behavior under team production and a piece rate payment scheme and suggest an explanation for the
mixed results on shirking in team production.
In the experiments, participants complete a real-effort forecasting task and are rewarded
based on the accuracy of their forecasts. The experimental task is designed to allow measurement of
both individual contributions and team performance.1 Participants were randomly assigned to one of
1
two conditions. In the piece-rate condition, participants are paid based on their own performance
while the team production condition rewards participants based on the average performance of the
team. In each condition, participants produce forecasts for twenty rounds. In each round,
participants receive feedback on their forecast error and earnings. In the team production condition,
participants also receive information on the forecast error of the team.
The results show no statistically significant difference in performance between the
conditions. However, this result masks important differences in the behavior of men and women
across the conditions. The men in the team production condition show significantly higher
performance than the men in the piece rate. For women, the results are reversed. The women in the
team production condition show significantly lower performance than the women in the piece rate.
In addition, men change their behavior in the team production condition based on measures
of relative performance. The women do not. Forecast errors for the women are explained by the
measure of basic skill and time spent on the task. In essence, the men in the team production
condition compete even though the payment scheme provides no monetary incentive to compete.
These results therefore suggest a connection between behavior in team production and recent studies
of gender differences in competitive behavior that find incentives conditioned on relative
performance (i.e., a tournament) raise the performance of men relative to women (Gneezy et al.,
2003; Gneezy and Rustichini, 2004; Vandegrift et al., 2005).
Taken together, the results suggest that environmental cues or a reference frame that allows
for meaningful comparisons with others may be as important in fostering competition as explicit
1
Metering individual contributions will often be difficult or impossible in many real-world settings (Alchian and
Demsetz, 1972; Blair and Stout, 1999). In fact a firm may adopt team production techniques because some inputs have a
higher value in team production than in their next best use and it is difficult to attribute any portion of output to a single
team member. However, a firm might also institute team production incentives simply because the firm wishes to foster
cooperation among workers.
2
monetary incentives to compete. We may also reconcile differences in results across empirical
studies of team production by appealing to differences in environmental cues or reference frame.
Experiments that employ a real-effort task find no evidence of shirking (van Dijk et al., 2001 and
the present study) while experiments that use a procedure designed to mimic effort choices find
evidence of shirking (Nalbantian and Schotter,1997; Meidinger et al., 2003). In fact, team
production experiments that mimic effort choices produce results that more closely resemble
behavior in public goods tasks than behavior in real-effort experiments of behavior under team
production.
In field data, environmental cues and reference frames are harder to detect. However, it is
worth noting that Hamilton et al. (2003) fail to find evidence of shirking in a garment manufacturing
facility where teams of six to seven workers are arrayed in a U-shaped work space of about 12 by 24
feet. Thus, the workers had timely and salient information on the productivity of the group.
II. Background
Shirking and Team Production
In the classic theoretical treatment of team production, Alchian and Demsetz (1972)
consider the case of two men jointly lifting heavy cargo into trucks. If we can observe only the total
weight loaded each day, it is impossible to determine each individual's contribution. Because it is
impossible to identify individual contributions, team members have an incentive to shirk. If there
are n team members, then each team member bears only 1/n of the costs of their shirking. However,
each team member receives the full benefits of their shirking. Thus, each member sets their
marginal benefits equal to 1/n marginal costs.
3
The subsequent theoretical literature uses the shirking result as a starting point and focuses
on contractual solutions to the problem of shirking (Holmstrom, 1982; Rasmusen, 1987; Itoh, 1991;
McAfee and McMillan, 1991; and Legros and Mattthews, 1993).2 However, the evidence on the
importance of shirking in the empirical literature is mixed. Some studies find clear evidence of
shirking (Gaynor and Gertler,1995; Nalbantian and Schotter, 1997; Meidinger et al., 2003) while
others do not (van Dijk et al., 2001; Hamilton et al., 2003).
To analyze behavior under team production, Nalbantian and Schotter (1997) conduct a
controlled experiment. Their procedure mimic effort choices by assuming that each individual has
identical effort costs and that the effort “costs” are generated by a specific function.3 The
experiment fixes group size at six and tests a number of different compensation schemes under team
production. Participants made decisions about effort levels in each of 25 rounds.
Using this design, Nalbantian and Schotter find sub-optimal effort levels when team
members are awarded equal shares of team output (i.e., the standard team production problem). That
is, when revenues are shared, shirking occurs. While the mean effort levels were above the
predicted (shirking) equilibrium, there was a downward trend that converged on the predicted value
(i.e., the Nash equilibrium prediction). Thus, the results were consistent with behavior in public
goods experiments. Participants supplied effort (or contributions to the public good) above the Nash
equilibrium in early rounds but effort fell over time.
2
For instance, Holmstrom (1982) suggests a forcing contract mechanism to resolve the shirking problem. The
forcing contract specifies a performance target for the firm or a group within the firm. The target may be based on
revenue or some other outcome. If the target is met or exceeded, all the workers in the group (or firm) share in the
revenue generated. If the target is not met, each worker is paid a relatively low penalty wage.
3
Nalbantian and Schotter (1997) allowed subjects to select an integer e between zero and 100 (inclusive). Each
number had a corresponding cost. The corresponding cost was generated by the function C(e) = e2/100. After
choosing a number, the experimenters circulated a box of “random numbers” (bingo balls labeled with integers
from –a to +a). The sum of the random number and the decision number produced a total number. Subjects with
higher total numbers received higher fixed payments. Thus, the task facing subjects was to learn the optimal number
to purchase given the cost structure.
4
Meidinger et al. (2003) use a similar task to study interactions between a principal and a
team composed of two agents. The experiment manipulates the productivity of the agents. In one
condition, “effort” choices across the two agents have the same productivity effect, while in the
second condition, the productivity levels vary. The task has two decision stages. In the first stage,
the principal offers the agents a residual return. If both agents accept the offer, the agents choose an
effort level and the gains are distributed among the participants. Meidinger et al. find that under
both conditions agents supply sub-optimal effort levels (i.e., free riding) and that free riding is much
greater when the agents vary in their productivity.
In contrast to Nalbantian and Schotter (1997) and Meidinger et al. (2003), van Dijk et al.
(2001) use a real-effort task and find that participants do not shirk. The task required participants to
search in a two-dimensional space, S = {(H, V): H, V∈ [a, -a], with a an integer}, to find the highest
possible value of a single-peaked function. Search for the peak started at the (0, 0) coordinate and
participants were permitted to raise or lower H and V in discrete steps of one over a fixed time
period. During each time period, the subjects could work on two separate searches (A and B) and
switch between the searches costlessly. Search A is work for the employer while search B is
intended to capture activities valuable only to the worker that may be undertaken on company time.
Consequently, Search A rewards differed across conditions and Search B activities were
always rewarded based on a piece rate. In the team condition, participants were randomly matched
with one other participant and they were paid in search A based on the average performance of the
group. (They received the piece rate for search B.) In the piece rate condition, participants were paid
in both searches (A and B) on the basis of a piece rate. Comparing the piece rate and team
conditions van Dijk et al. (2001) find no statistically significant difference in either the effort or the
performance levels. Moreover, performance in the team condition did not fall over time.
5
Recent analyses of firm and individual-level data also find mixed evidence on the
importance of shirking.4 Gaynor and Gertler (1995) examine the behavior of physicians in
partnership arrangements. Using the number of office visits as the measure of physician effort,
they find that increased revenue sharing among partners reduces the number of office visits. In
contrast, Hamilton et al. (2003) examine the case of a single garment plant that shifted from an
individual piece rate to a group piece rate (i.e., team production). Teams were composed of six to
seven workers and the team’s net receipts were divided equally among team members.
Productivity rose 18% after the introduction of teams. In addition, higher ability workers joined
teams at a higher rate and this accounted for about one fifth of the productivity increase.
Hamilton et al. (2003) contend that there are two basic ways to explain the attenuation (or
elimination) of the free rider problem. First, the problem may be reduced through effective
monitoring and punishing of free riders. Such punishments may be administered through explicit
threats to discontinue cooperation or through peer pressure. Threats to discontinue cooperation
require that discounted losses from lost cooperation exceed the one-shot benefits of shirking.
Peer pressure reduces the free rider problem because departures from team norms reduce
individual utility. Second, synergies related to team production imply that that team productivity
is more than the simple sum of the performance of individual team members. The opportunity to
collaborate draws on new skills. These skills may improve coordination as well as allow team
members to discover methods to assign, organize, and redesign tasks.
In addition, Hamilton et al. (2003) find that teams with more heterogeneity in worker
ability show better performance. They suggest that greater heterogeneity may cause better
4
A separate empirical literature analyzes worker productivity under profit sharing plans (Hansen, 1997; Weitzman
and Kruse, 1990). However, the baseline for determining improvements is a reward structure in which rewards do
not depend on productivity.
6
performance for two reasons. First, more skillful workers may be able to teach the less skillful
how to execute tasks more efficiently. High-ability workers raise the productivity of low-ability
workers. Second, bargaining over the common work pace will produce a difference result when
there is wider variation in intra-team worker ability. Bargaining over work pace occurs because
high-ability workers may threaten to opt out. Such threats are credible because high-ability
workers have the best outside options. To retain the high-ability worker, the rest of the team may
accept a faster work pace.
Relation Between Team Production and Public Goods Experiments
Nalbantian and Schotter (1997) note that the structure of team production and public goods
experiments is similar.5 In each case, costs are borne individually while group output is shared
equally. The typical public goods experiment gives each participant a sum of money. The
participant has the option of contributing some portion of the sum to a common pool. The total
contributions to the pool are multiplied by a factor greater than one and returned to the subjects in
equal shares.
Experiments that require the completion of a real-effort task differ from public goods
experiments (and Nalbantian and Schotter, 1997) in two key respects.6 First, real-effort experiments
allow differences in ability to arise endogenously. While public goods experiments generally show
that asymmetries in payoffs (not ability) reduce cooperative behavior,7 an individual's pride in
5
The large literature on public goods experiments is summarized in Ledyard (1995).
6
Nalbantian and Schotter (1997) note two differences are differences between public goods experiments and their team
production experiment. First, in contrast to public goods experiments, group output under team production contains a
random component. Various exogenous factors (e.g. changes in market demand) imply a probabalistic relation between
effort and output. Second, the compensation schemes offered under team production have no analogue in public goods
theory. Another key difference is that team production typically requires participants to contribute effort while public
goods situations require monetary contributions.
7
See Bagnoli and McKee (1991); Fisher et al. (1995).
7
his/her talent or skill may be a significant deterrent to shirking under team production. Second, real-
effort experiments more closely resemble a typical workplace interaction. In a typical workplace
interaction, individuals may be uncertain about whether poor performance by team members is the
result of low ability or shirking.
Gender and Behavior under Differing Labor Compensation Schemes
Although the public goods literature has devoted some attention to differences between men
and women8, there is relatively little on gender differences in behavior under various labor
contracts. The central result is that men respond more strongly to competitive incentives than
women (Gneezy et al., 2003; Gneezy and Rustichini, 2004; Vandegrift et al., 2005). Gneezy et al.
(2003) report an experiment in which participants solve computerized maze problems. When
payment is based on the absolute number of computerized mazes solved (i.e., a piece rate), they find
no difference in performance between men and women. However, when men and women are paid
based on tournament incentives, the performance of men increases while the performance of women
remains the same as in the piece rate.
Gneezy and Rustichini (2004) find a similar result in a field experiment with elementary
school students. In the experiment, students ran a 40-yard dash both alone and in pairs. In the first
round, all students ran alone. In second round, some students ran against competitors while others
ran alone. Overall, boys matched against competitors showed a significant improvement in the
second round but the girls did not. When girls competed against girls in the second round, their
times were slower. When boys competed against boys in the second round, their times were faster.
8
Ledyard (1995) notes that in public goods experiments the evidence on gender differences in contribution rates is
mixed.
8
While the girls showed a small improvement in the mixed gender races, the improvement was far
larger for boys.
Using Gneezy et al. (2003) as a starting point, Vandegrift et al. (2005) examine choices and
behavior when agents are able to choose between a payment scheme that rewards based on absolute
performance (i.e., piece rate) and a scheme that rewards based on relative performance (i.e., a
tournament). The structure of the rewards in the tournament option varied across conditions, the
piece rate payoffs remained the same. In one condition (winner-take-all), only the most accurate
forecaster who chose the tournament for each round received a payment. In the other condition
(graduated tournament condition), the same payment was divided among the first, second, and third
finishers who chose the tournament. Men in the winner-take-all condition showed significantly
greater forecasting accuracy than men in the graduated tournament condition. Women showed no
statistically significant difference in forecasting accuracy between winner-take-all and graduated
tournament conditions.
III. Experimental Design
To test behavior under team production, we design an experiment that allows participants to
contribute real effort towards team output. In one condition, we compensate team members based
on team performance. If total returns are divided evenly among the team members, R indicates
returns, and ei indicates costly effort, we may express the individual team member's maximization
problem as:
(1) Max G = ∑ Ri (ei) / n - C(ei)
9
This implies that as n rises, the returns to effort fall while the costs remain unchanged.
Consequently, the team members will choose lower effort levels and output will fall. In the other
condition, participants completed the same task but were paid based only on their own performance.
We conducted the experiments using students at The Pennsylvania State University as
participants. A total of 84 students participated. Each of the two experimental conditions had 42
participants divided among three separate sessions. All sessions were conducted at the LEMA lab at
The Pennsylvania State University. Participants completed a computer-based forecasting task
known as a multiple-cue-probability-learning (MCPL) task.9
For each of 20 periods, participants were asked to forecast the price of a fictitious “stock”
using two exogenous “cues”. Each period, the values of the cues changed, but the relationship
between the cues and the price of the stock remained the same throughout the experiment and across
both experimental conditions. Because the relationship was unknown to all participants, they had to
discover it from the exogenous cues. Ten examples of the cue-price relationship were provided to
each participant. Participants examined the examples prior to making their forecasts. Following
review of the ten examples, participants produced three practice forecasts based on three new sets of
cue values.
Following these practice rounds, the experiment began and participants received the first of
20 sets of cues to make their forecast. Accurate forecasts under such conditions require participants
to detect the covariation between the cues and the stock price (Goldstein and Hogarth, 1997).
Unknown to all participants, the price of the stock was determined by the relationship:
(2) Price = 85 + 0.3 * Cue 1 + 0.7 * Cue 2 + e
9
See Balzer et al. (1992) and Goldstein and Hogarth (1997) for reviews of research using MCPL tasks by
psychologists. See Schmalensee (1976), Bolle (1988), Brown (1995, 1998), Vandegrift and Brown (2003), and
Vandegrift and Brown (2005) for examples of the use of MCPL tasks by economists.
10
where e is a uniformly distributed discrete random variable on the interval (-3, 3). The cue values
ranged from 101 to 393 and the subsequent prices ranged from 230 to 424.
Experimental Conditions
In one condition, participants were paid based on a piece rate. The piece rate paid
participants based on their absolute forecasting error. Participants with more accurate forecasts
received higher payments. The payment to the individual participants in the piece rate condition was
determined by:
(3) piece rate = $1.70 – (.03 * forecast error participant i).
In the second condition, participants receive one seventh of the total group output where individual
contributions are determined by the piece rate in equation (3) above.

7
∑ ($1.70 − (.03 * forecast error))

i =1
i
(4) team production rate =
7
The amounts were added across the rounds and paid to the participants at the end of the experiment.
Table 1 summarizes the experimental conditions.
Procedure
After the participants entered the lab, they were randomly assigned a seat in front of a
computer and were given a set of instructions describing the forecasting task. The instructions
described the nature of the forecasting task (i.e., forecast the price of a fictitious stock using
exogenous cues for 20 rounds), that the values of the cues changed each round but their relationship
to the stock price remained constant throughout the experiment, and that all participants would see
11
the identical cue values each round. The instructions also explained that an initial endowment of $5
had been placed in each participant’s “Earnings Account.” Earnings from the experiment were
added to the earnings account and the participants received a payment in cash at the conclusion.
After answering any remaining questions, the participants were told they would have five
minutes to examine ten examples of the cue-price relation. Each of the ten examples as well as the
twenty rounds that followed reflected the same underlying relationship (reflected in equation (1)
above). At the end of the 5-minute period, the participants completed 3 practice rounds. In the
practice rounds, participants received two cue values and submitted their forecast. Each round the
participants received feedback on their forecast error and the actual price of the stock. Participants
were not paid for the practice rounds. The payment scheme was explained following the practice
rounds and participants were shown the round one cue value(s) and given two minutes to enter their
forecasts into the computer. Once all participants had entered their forecasts, a computer program
calculated each participant’s forecast error and actual earnings.10
In each condition, participants received information in each round on: (1) the actual price of
the stock; (2) the participant’s forecast; (3) the participant’s forecast error; (4) the participant’s
earnings. In addition, participants in the team production condition also received information each
round on (5) the average forecast error for the group. The participants were encouraged to record
any relevant information on a sheet of paper and were able at any time to recall the information
from previous rounds.
After giving the participants one minute to examine their results, the cue values for the next
period were then shown to each participant. This process was repeated for 20 rounds. The
10
The program was written by M. Abdullah Sahin utilizing the Z-tree. Copies of the program as well as the data are
available upon request. All instructions are available at: http://vandegrift.intrasun.tcnj.edu
12
experiment ran for about 1 hour. Throughout the experiment all information was private, including
participant forecasts. At the end of each session, the participants completed a post-experiment
questionnaire and were paid their total earnings (initial endowment plus the sum of earnings from
the 20 tournaments).
Payoffs could range from $5 to $39 for in either the piece rate or the team production
conditions (including the $5 initial endowment). Actual payoffs varied from $7.71 to $34.41 in the
piece rate condition and $23.41 to $29.32 in the team production condition (including the $5 show-
up payment). The average payoff across all conditions was $26.27. In the piece rate condition the
average payoff was $26.67 while in the team production condition, the average payoff was $25.88.
Of the 84 participants, 59% were men. The proportion of men was slightly lower in the team
production condition than in the piece rate condition (55% v. 64%).
IV. Results
Individual Behavior
Table 2 reports means and standard deviations at the observation level for forecast errors for
rounds 1-20, 1-10, and 11-20. Higher forecast errors indicate lower performance. For each time
period, the means and standard deviations are reported by gender and condition. Men had average
forecast errors about two points lower (about 8%) than women across both conditions. Participants
in the piece rate condition had forecast errors that were only one point lower (about 4%) than the
team production condition. Looking at the performance of men and women across the two
conditions, the differences are striking. In the piece rate condition, the women had much lower
forecast errors than the men – about 4.5 points or about 18%. In the team production condition, the
13
situation was reversed. The men had much lower forecast errors than the women – about 8.2 points
or about 28%.
Figure 1 shows the forecast errors across all twenty rounds for the piece rate and the team
production conditions. Interestingly, there is little difference in forecast errors across the two
conditions and participants in the team production condition do not decrease effort over time. This
stands in marked contrast to behavior in public goods experiments (Ledyard, 1995). Figure 2
compares men and women in the piece-rate condition across all 20 rounds. In nearly every round,
women outperform the men. Figure 3 compares men and women in the team production condition
across all 20 rounds. In nearly every round, men outperform the women.
To investigate more systematically the link between gender, team production incentives, and
forecasting error, we run random-effects generalized least squares regressions with forecast errors
for each round as the dependent variable. We use a unique participant-specific id to control for
individual fixed effects.11 The regressions also control for the payoff structure, gender, and
participant skill. We control for skill in two ways: the average per-round forecast error by
participant for the three practice rounds (Practice Average) and the forecast error for each
participant in round t-1 (Lagged Error).
The results are reported in Table 3. Column 1 reports the regression on the entire data set.
There are no statistically significant differences in forecast errors for men and women nor is there
any statistically significant difference in forecast errors between the piece rate and the team
production conditions. The controls for ability (Practice Average and Lagged Error) are both
positive and significant indicating that higher average errors in the practice rounds and higher
11
Computing the average forecast error for each participant across the 20 rounds and running simple OLS
regressions does not change the basic results.
14
forecast errors in round t-1 raise forecast errors in round t of the experiment. The round coefficient
is negative and significant indicating that errors fall over time.12 The insignificant result on the
Team coefficient directly violates the standard assumption of the theoretical literature on team
production. In general, participants do not reduce effort/performance in the team production
condition compared to the piece rate condition.
To further investigate the causes of the stronger than expected performance in the team
production condition, we run separate random-effects regressions for the piece rate and team
production conditions with gender as a covariate. In addition, we run random effects regressions
that separate the men from the women. These regression results appear in Table 3 as columns 2
through 5. The results show that the women reduce performance under team production. Forecast
errors for women in the team production condition are about 30% higher than they are in the piece
rate. Consequently, women behave in a manner consistent with the standard predictions of
economic theory. Men, on the other hand, increase their performance in the team production
condition. Forecast errors for men in the team production condition are about 14% lower than they
are in the piece rate.
We may see the same basic results by running separate random effects regressions for the
piece rate and team production conditions. In the piece rate, men have lower performance than the
women. Forecast errors for men in the piece rate condition are about 17% higher than they are for
the women in the piece rate condition. Forecast errors for men in the team production condition are
about 24% lower than the women in the team production condition. The average forecast error by
participant in the practice rounds is significant across all specifications but it is generally a stronger
12
To ensure that behavior stabilized over time, we recalculated each regression in Tables 3,4,6, and 7 using only the
last 10 rounds of forecasts. For every regression, the coefficient for round was small and statistically insignificant.
15
predictor for the forecast errors of the women. In contrast, forecast error in the previous round
predicts performance better for the men than the women. Comparing the regression equations for
the piece rate and the team production conditions, we see that the magnitude of the round effect is
essentially the same in both equations. On average, forecast errors are about a quarter of a point
lower in round t compared to round t-1. This suggests that effort/cooperation levels do not
deteriorate as they do in public goods games.
The results also suggest that the men are adjusting their effort based on their performance in
the prior round while the women’s performance is a function of their skill level and the number of
elapsed rounds. This is consistent with Gneezy et al. (2003), Gneezy and Rustichini (2004), and
Vandegrift et. al (2005). If men have a stronger desire to compete, information on their relative
position in the last round should predict effort levels and performance. Lagged error is highly
correlated with relative position in the last round. To test this hypothesis more directly, we create
two new variables to capture the information that participants in the team production condition
receive each round.
As noted above, participants in the team production condition receive information on
average forecast error for the group in round t-1 before making their forecast in round t.
Consequently, team production participants can infer their relative position in the team. To measure
this relative position we calculate: 1) the forecast error rank in team where 1=most accurate and
7=least accurate for participant i in round t-1 (Lagged Rank); and 2) a dummy variable that
equals 1 if forecast error for participant i is less than average team forecast error in round t-1
(Lagged Rankdum).
The random-effects regression results for the team production condition are reported in
Table 4. Elapsed time on the task (round) and average forecast errors in the practice rounds
16
explain the forecast errors for the women. For men, elapsed time on the task is insignificant and
the effect of average forecast error in the practice rounds is much smaller. Instead, the men
respond to the indicators of relative position. Rank in the last round (Lagged Rank) is a strong
predictor of forecast errors for men while it is insignificant for women. A one-integer increase in
rank in round t-1 raises forecast errors for men in the team production condition by 1.5 points. In
addition, men that have forecast errors below the mean for their team in round t-1 (Lagged
Rankdum), have forecast errors that are on average 4.3 points lower. Interestingly, the men focus
on relative position even though relative position does not influence their rewards.
While the number of teams is small (n = 6), it is possible to draw some tentative
conclusions. The central result is that teams with a higher standard deviation in ability (holding
average ability in the team constant) have lower forecast errors. This replicates one of main results
in Hamilton et al. (2003) under very different conditions. As above, we measure participant ability
by the average forecast error in the three practice rounds (forecasting trials prior to the experiment).
We average the individual observations across each team. To measure variation in ability for each
team, we compute the standard deviation of the average practice round forecast errors. Table 5
reports these basic measures across teams: average error in rounds 1-20, average error rounds 11-20
and forecast error in the practice rounds, standard deviation for each team of the practice round
averages for each individual team member, and the proportion of males to total team members.
Table 6 reports a regression on team average forecast error. Unfortunately, fixed effects
regressions cannot be calculated because the independent variables do not change across rounds.
Because t = 20 and n = 6, we violate one of the assumptions of the random-effects procedure
(i.e., n > t). Consequently, we average the team forecast errors across rounds 1 – 4, 5 – 8, 9 – 12, 13
– 16, and 17 – 20. By creating 5 time periods for each of the 6 teams, we maximize the number of
17
observations and still meet the requirements for random effects. Table 6 shows the results of the
regressions on team average forecast error.
Not surprisingly, teams with higher average forecast errors in the practice rounds had higher
forecast errors over rounds 1-20. A one-point increase in average team forecast error in the trial
period implies a 0.68 increase in the average team forecast error over rounds 1-20. More
interestingly, an increase in the standard deviation in ability across team members implies lower
forecast errors (holding average ability of the team members constant). A one-point increase in
standard deviation of average forecast errors across team members (Team Practice Deviation) in the
trial period causes a 0.62 decrease in the team average forecast error over rounds 1-20. This
suggests that more teams with more diversity in ability, holding average ability in the team constant,
will perform better. The effect of the male ratio is small and statistically insignificant.
To get a picture of team dynamics, we test whether the standard deviation of forecast errors
across team members in the prior round and the average forecast error for the team in the prior
round impact the team average forecast error in the subsequent round. The results are displayed in
Table 7. Standard deviation of team forecast errors in the prior round does not predict average
forecast error for the team in the subsequent round. Apparently, weaker forecasters (as measured by
the team standard deviation of forecast errors in the practice rounds) work harder to improve in
rounds 1-20 and this eliminates any link between the team standard deviation and team average
forecast errors. Consequently, there is a negative relation between practice round standard deviation
in forecast errors across team members and forecast errors over rounds 1-20 and no relation once
the individual participants are grouped into teams and then paid based on their team performance. It
must be that the weaker forecasters increase performance rather than the stronger forecasters
18
decrease performance because increases in average forecast errors across team members (Team
Practice Deviation) implies lower forecast errors.
Team average forecast errors in round t-1 are negatively related to team average forecast
errors in the subsequent round. A one-point decrease in team average forecast error in round t-1
implies a 0.31 increase in average team forecast error in round t. Apparently, the team reacts to poor
performance by working harder. Strong performance in the prior round causes participants to reduce
effort. This likely explains why lagged error in the individual-level data does not explain forecast
errors.
Comparing columns 5 and 6 in Table 3, we see that lagged error has no effect in the team
production condition. In the piece-rate condition, lower forecast errors in the prior round imply
lower forecast errors in the subsequent round. This likely picks up the skill of the individual
forecaster. In the team production condition, participants also change effort levels in response to
team performance. Consequently, we are unable to identify a relation between forecast error in the
prior round and the subsequent round in the team production condition.
V. Conclusion
Team production incentives are commonly employed in business firms yet the behavior of
employees under such incentives is not well understood. To advance our understanding of behavior
under team production, we conduct a controlled experiment with two experimental conditions. In
the piece rate condition, participants were paid based on the absolute size of their forecasting error
in a simple forecasting task. In the team production condition, participants were assigned to groups
of seven members and paid based on the average performance of the group.
19
While the theoretical literature on team production assigns great weight to the problem of
shirking, the empirical literature often fails to detect it. In a recent analysis of behavior under team
production, Hamilton et al. (2003) fail to detect shirking. They suggest that the shirking problem
may be reduced through effective monitoring and punishing of free riders (e.g., explicit threats to
discontinue cooperation, peer pressure) and synergies related to team production. Such synergies
imply that team productivity is more than the simple sum of the performance of individual team
members.
Like Hamilton et al., we find no evidence of shirking when we compare performance in a
piece rate with team production. However, the design of our experiments suggests that factors other
that monitoring and synergies are at work. Because participants in our experiment could not
communicate and the task allowed for no complementarities across participants, it is not possible to
explain our results by appealing to synergies. It is also unlikely that monitoring explains our results.
Low performers could not be identified and the experiment provided no mechanism for making
threats or peer pressure. While it is possible that participants might withhold effort to induce
cooperation, there is no evidence that they did so. Indeed, our results show that teams with weak
performance in the current round increase performance in the subsequent round.
Instead, our evidence suggests that we fail to detect shirking in a comparison of performance
in the team production and piece rate conditions because men in the team production condition
compete. The men compete even though the team production payments provide no incentive to
compete. Comparing the performance of men across conditions, men in the team production
condition show significantly higher performance than the men in the piece rate. For women, the
results are reversed. The women in the team production condition show significantly lower
performance than the women in the piece rate. Because the men compete, they change their
20
behavior in the team production condition based on measures of relative performance. The women
do not. Forecast errors for the women are explained by the measure of basic skill and time spent on
the task.
Hamilton et al. (2003) also find that teams with more heterogeneity in worker ability show
better performance. They suggest that greater heterogeneity may cause better performance because
more skillful workers teach the less skillful how to execute tasks more efficiently and bargaining
over the common work pace will produce a different result when there is wider variation in intra-
team worker ability. We also find some evidence that, holding average skill level of the team
constant, teams with a larger variation in skill levels have lower forecast errors. However, the
design of our experiments suggests that factors other than teaching/learning and bargaining are at
work.
Because participants in our experiment had no outside option, there can be no threats to
exercise an outside option. To the extent that there is higher performance among more
heterogeneous teams in our experiment, teaching and threats to exercise an outside option are not
the cause. Because participants in our experiments completed the same task, synergies are not
possible. To the extent that there is higher performance under team production in our experiment,
synergies are not the cause. We propose instead that larger differences in performance among team
members provide clearer signals of relative performance and unambiguous signals provoke more
effort. In sum, our results suggest that environmental cues or a reference frame that allows for
meaningful comparisons with others may be the key determinant in whether shirking behavior
emerges.
21
References
Alchian, A. and Demsetz, H., 1972, “Production, Information Cost, and Economic Organization,”
American Economic Review, 62, 777-795.
Bagnoli, M., and McKee, M., 1991, “Voluntary Contribution Games: Efficient Private Provision of
Public Goods,” Economic Inquiry, 29, 351-366.
Balzer, W., Sulsky, L., Hammer, L. and Summer, K., 1992, “Task Information, Cognitive
Information, or Functional Validity Information: Which Components of Cognitive Feedback Affect
Performance?” Organizational Behavior and Human Decision Processes, 53, 35-54.
Blair, Margaret M. and Stout, Lynn A., 1999, “Team Production in Business Organizations: An
Introduction,” Journal of Corporation Law, 24, 743-750.
Bolle, F., 1988, “Learning to Make Good Predictions in Time Series,” in Bounded Rational
Behavior in Experimental Games and Markets, R. Teitz, W. Albers and R. Selton, eds. (Springer-
Verlag, Berlin).
Brown, P. M., 1995, “Learning From Experience, Reference Points and Decision Costs,” Journal of
Economic Behavior and Organization, 27, 381-399.
Brown, P. M., 1998, “Experimental Evidence on the Importance of Competing for Profits on
Forecasting Accuracy,” Journal of Economic Behavior and Organization, 33, 259-269.
Fisher, J., Isaac, R., Schatzberg, J., and Walker, J., 1995, “Heterogeneous Demand for Public
Goods: Behavior in the Voluntary Contributions Mechanism,” Public Choice, 85, 249-266.
Gaynor, M. and Gertler, P., 1995, “Moral Hazard and Risk Spreading in Partnerships,” Rand
Journal of Economics, 26, 591-613.
Gneezy, U., Niederle, M., and Rustichini, A., 2003, “Performance in Competitive Environments:
Gender Differences,” Quarterly Journal of Economics, 118, 1049-1074.
Gneezy, U. and Rustichini, A., 2004, “Gender and Competition at a Young Age,” American
Economic Review, 94, 377-381.
Goldstein, W. and Hogarth, R., 1997, “Judgement and Decision Research: Some Historical
Context,” in Research on Judgment and Decision Making: Currents, Connections and
Controversies, William Goldstein and Robin Hogarth, eds. (Cambridge University Press,
Cambridge).
22
Hamilton. B. H., Nickerson, J. A. and Owan, H., 2003, “Team Incentives and Worker
Heterogeneity: An Empirical Analysis of the Impact of Teams on Productivity and Participation,”
Journal of Political Economy, 111, 465-497.
Hansen, D. G., 1997, “Worker Performance and Group Incentives: A Case Study,” Industrial and
Labor Relations Review 51, 37-49.
Holmstrom, Bengt, 1982, “Moral Hazard in Teams,” Bell Journal of Economics, 13, 324-340.
Itoh, Hideshi, 1991, “Incentives to Help in Multi-agent Situations,” Econometrica, 59, 611-636.
Ledyard, John O., 1995, “Public Goods: A Survey of Experimental Research,” in The Handbook of
Experimental Economics, John Kagel and Alvin Roth, eds. (Princeton University Press: Princeton,
NJ).
Legros, Patrick, and Matthews, Steven A., 1993, “Efficient and Nearly Efficient Partnerships,”
Review of Economic Studies, 60, 599-611.
McAfee, R. Preston, and McMillan, John, 1991, “Optimal Contracts for Teams,” International
Economic Review, 32, 561-577.
McClurg, Lucy N., 2001, “Team Rewards: How Far Have We Come?” Human Resource
Management, 40, 73-86.
Meidinger, C., Rullière, J., and Villeval, M., 2003, “Does Team-Based Compensation Give Rise to
Problems When Agents Vary in Their Ability?” Experimental Economics, 6, 253-272.
Nalbantian, Haig, and Schotter, Andrew, 1997, “Productivity Under Group Incentives: An
Experimental Study,” American Economic Review, 87, 314-341.
Rasmusen, Eric, 1987, “Moral Hazard in Risk Averse Teams,” Rand Journal of Economics, 18,
428-435.
Schmallensee, R., 1976, “An Experimental Study of Expectation Formation,” Econometrica, 44,
17-41.
van Dijk, F., Sonnemans, J., and van Winden, F., 2001, “Incentive Systems in a Real Effort
Experiment,” European Economic Review, 45, 187-214.
Vandegrift, D., and Brown, P. M., 2003, “Task Difficulty, Incentive Effects, and the Selection of
High-Variance Strategies: An Experimental Examination of Tournament Behavior,” Labour
Economics, 10, 481-497.
Vandegrift, D., and Brown, P. M., 2005, “Gender Differences in the Use of High-Variance
Strategies in Tournament Competition,” Journal of Socio-Economics (forthcoming).
23
Vandegrift, D., Yavas, A., and Brown, P. M., 2005, “Men, Women and Competition: An
Experimental Test of Behavior,” unpublished m.s.
Weitzman, M. L. and Kruse, D. L., 1990, “Profit Sharing and Productivity,” in Paying for
Productivity: A Look at the Evidence, A. Blinder ed. (Brookings Institution, Washington, DC).
24
Table 1. Summary of experimental conditions
Equation Determining Price:
price = 85 + 0.3 * Cue 1 + 0.7 * Cue 2 + e
Condition 1 – piece rate

Condition 2 – team production
Payoffs in the piece rate
payoff = $1.70 – (.03 * forecast error).
Payoffs in team production

7
∑ ($1.70 − (.03 * forecast error))

i =1
i
payoff =
7
25
Table 2. Means and Standard Deviations by Condition
Forecast Error a Forecast Error Forecast Error

Rounds 1-20 Rounds 1-10 Rounds 11-20
Overall 24.13 25.64 22.63

(26.57) (28.12) (24.86)
Piece Rate 23.6 25.34 21.86

(27.57) (28.47) (26.56)
Team Production 24.67 25.94 23.40

(25.54) (27.79) (23.03)
Men 23.26 24.28 22.24

(27.01) (27.15) (26.84)
Women 25.42 27.64 23.20

(25.89) (29.41) (21.64)
Men Piece Rate 25.21 27.09 23.34

(30.19) (30.56) (29.75)
Women Piece Rate 20.69 22.18 19.20

(21.85) (24.05) (19.36)
Men Team Production 20.97 20.98 20.96

(22.52) (22.12) (22.96)
Women Team Production 29.15 31.95 26.35

(28.16) (32.45) (22.84)
Standard deviations in parentheses.

a
Forecast Error: average per-round absolute forecast error by participant│Pt - Peit│rounds 1-20.
Pt = the price of the stock in period t. Peit = participant i’s forecast in period t.
26
Table 3. Random-Effects Generalized Least Squares Regressions on Individual Forecast Errors
Dependent Forecast Forecast Forecast Forecast Forecast

Variable: Error a Error Error Error Error
Men Women Piece Rate Team
Only Only Only Only
Constant 19.26*** 19.03*** 17.97*** 14.11*** 25.37***
(2.06) (2.45) (2.77) (2.89) (2.65)
Maleb -1.24 3.48* -7.02***

(1.32) (1.96) (1.77)
Team c 0.145 -3.61** 6.21***

(1.31) (1.72) (1.99)
Practice Averaged 0.156*** 0.104*** 0.237*** 0.149*** 0.163***

(0.028) (0.037) (0.044) (0.051) (0.033)
Lagged Error e 0.156*** 0.225*** 0.008 0.245*** 0.028

(0.025) (0.033) (0.039) (0.036) (0.035)
Round f -0.253** -0.205 -0.364** -0.231 -0.277*

(0.118) (0.156) (0.177) (0.172) (0.159)
R2 within 0.01 0.01 0.01 0.02 0.01

2
R between 0.45 0.65 0.42 0.67 0.37
R2 overall 0.06 0.07 0.08 0.09 0.06
N 1596 950 646 798 798
Standard errors in parentheses.

* = significant at the 0.1 level, ** = significant at the 0.05 level, *** = significant at the 0.01 level.
Group variable: participant
a
Forecast Error: per-round absolute forecast error by participant│Pt - Peit│rounds 1-20.
b
Male: dummy variable = 0 if female, 1 if male.
c
Team: dummy variable = 0 if piece rate, 1 if team production.
d
Practice Average: the average per-round forecast error for the practice rounds for participant i.
e
Lagged Error: forecast error by participant in round t-1.
f
Round: indicates round number (1-20).
27
Table 4. Random-Effects Generalized Least Squares Regressions on Individual Forecast Errors –
Team Production Condition
Dependent
Variable: Forecast Error a Forecast Error Forecast Error Forecast Error
Men Only Women Only Men Only Women Only
Constant 13.75*** 26.41*** 21.95*** 22.94***
(3.23) (4.55) (3.16) (4.24)
Lagged Rankb 1.52*** -0.202

(0.532) (0.713)
Lagged Rankdumc -4.34** 3.90

(2.28) (2.86)
Practice Averaged 0.105*** 0.240*** 0.110*** 0.250***

(0.039) (0.055) (0.040) (0.056)
Round e -0.138 -0.476* -0.133 -0.461*

(0.194) (0.257) (0.195) (0.256)
R2 within 0.01 0.01 0.01 0.03

R2 between 0.37 0.37 0.28 0.31
2
R overall 0.04 0.06 0.03 0.06
N 437 361 437 361

Group variable: participant
a
Forecast Error: per-round absolute forecast error by participant│Pt - Peit│rounds 1-20.
b
Lagged Rank: forecast error rank in team (1=most accurate, 7=least accurate) for participant i in
round t-1.
c
Lagged Rankdum: dummy variable = 1 if forecast error for participant i is less than average
team forecast error in round t-1.
d
Practice Average: the average per-round forecast error for the practice rounds for participant i.
e
28
Table 5. Means and Standard Deviations by Team – Team Production Condition
Average Error Average Error Team Team Practice Team Male

Rounds1-20 a Rounds11-20 b Practice c Deviation d Ratio e
Team 1 30.88 26.39 56.85 39.79 0.2857

Team 2 27.60 26.28 20.90 7.28 0.5714
Team 3 24.60 22.33 35.52 21.83 0.7142
Team 4 23.79 24.34 40.42 32.39 0.7142
Team 5 19.39 15.85 22.19 19.76 0.7142
Team 6 22.70 23.29 27.47 17.08 0.2857
a
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│rounds 1-20.
b
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│rounds 11-
20.
c
Team Practice: the average per-round forecast error for the practice rounds for team j.
d
Team Practice Deviation: the standard deviation for team j of the practice round averages for
each individual team member i.
e
Team Male Ratio: the proportion of males to total team members for team j.
29
Table 6. Random-Effects Generalized Least Squares Regressions on Average Team Forecasting
Error
Average
Dependent Variable: Forecast Error a
Rounds 1-20
Constant 20.49***
(5.81)
Team Practice b 0.680***

(0.238)
Team Practice
Deviation c -0.624**
(0.266)
Team Male Ratio d -0.051

(5.84)
Quintile Round e -1.44***

(0.568)
R2 within 0.22
2
R between 0.86
2
R overall 0.48
N 30
a
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│averaged
over rounds 1 – 4, 5 – 8, 9 – 12, 13 – 16, & 17 – 20 (i.e., quintile).
b
Team Practice: the average per-round forecast error for the practice rounds for team j.
c
Team Practice Deviation: the standard deviation for team j of the practice round averages for
each individual team member i.
d
Team Male Ratio: the proportion of males to total team members for team j.
e
Quintile Round: 1 = rounds 1 – 4, 2 = rounds 5 – 8, etc.
30
Table 7. Fixed-Effects Regression on Team Average Forecasting Error
Dependent Variable: Team Average

Forecast Error a
Constant 32.53***
(3.75)
Lagged Team Deviationb 0.190

(0.139)
Lagged Team Error c -0.312**

(0.151)
Round d -0.382**
(0.193)
R2 within 0.06
2
R between 0.83
R2 overall 0.03
N 114

** = significant at the 0.05 level, *** = significant at the 0.01 level.
Group variable: team
a
Average Forecast Error: average per-round absolute forecast error by team│Pt - Peit│rounds 1-20.
b
Lagged Team Deviation: standard deviation of forecast errors across participants by team for
round t-1.
c
Lagged Team Error: average forecast errors across participants by team for round t-1.
d
31
Figure 1.
Average Forecast Error by Condition
60
50
40
Average Forecast Error
Piece rate
30
Team Production
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Round
32
Figure 2.
Average Forecast Errors in the Piece Rate Condition by Gender
60
50
40
Men
30
Women
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Round
33
Figure 3.
Average Forecast Errors in the Team Production Condition by Gender
50
45
40
35
30
Men
25
Women
20
15
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Round
34

Vandegrift Yavas, An Experimental Test of Performance Under Team Production

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vandegrift Yavas, An Experimental Test of Performance Under Team Production

Uploaded by

Copyright:

Available Formats

An Experimental Test of Behavior under Team Production

JEL Codes: J33, M12, M52

understanding of behavior under team production, we conduct a series of experiments. We compare

mixed results on shirking in team production.

participants also receive information on the forecast error of the team.

The results show no statistically significant difference in performance between the

of gender differences in competitive behavior that find incentives conditioned on relative

2003; Gneezy and Rustichini, 2004; Vandegrift et al., 2005).

studies of team production by appealing to differences in environmental cues or reference frame.

Shirking and Team Production

marginal benefits equal to 1/n marginal costs.

others do not (van Dijk et al., 2001; Hamilton et al., 2003).

production. Participants made decisions about effort levels in each of 25 rounds.

equilibrium in early rounds but effort fell over time.

greater when the agents vary in their productivity.

threats to discontinue cooperation or through peer pressure. Threats to discontinue cooperation

members to discover methods to assign, organize, and redesign tasks.

accept a faster work pace.

Relation Between Team Production and Public Goods Experiments

result of low ability or shirking.

Gender and Behavior under Differing Labor Compensation Schemes

remains the same as in the piece rate.

larger for boys.

statistically significant difference in forecasting accuracy between winner-take-all and graduated

III. Experimental Design

(1) Max G = ∑ Ri (ei) / n - C(ei)

We conducted the experiments using students at The Pennsylvania State University as

The Pennsylvania State University. Participants completed a computer-based forecasting task

known as a multiple-cue-probability-learning (MCPL) task.9

(2) Price = 85 + 0.3 * Cue 1 + 0.7 * Cue 2 + e

(3) piece rate = $1.70 – (.03 * forecast error participant i).

contributions are determined by the piece rate in equation (3) above.

∑ ($1.70 − (.03 * forecast error))

Table 1 summarizes the experimental conditions.

calculated each participant’s forecast error and actual earnings.10

from previous rounds.

production condition than in the piece rate condition (55% v. 64%).

participant in round t-1 (Lagged Error).

production. In general, participants do not reduce effort/performance in the team production

condition compared to the piece rate condition.

are in the piece rate.

deteriorate as they do in public goods games.

receive each round.

As noted above, participants in the team production condition receive information on

Because t = 20 and n = 6, we violate one of the assumptions of the random-effects procedure

regressions on team average forecast error.

Practice Deviation) implies lower forecast errors.

Like Hamilton et al., we find no evidence of shirking when we compare performance in a

performance in the current round increase performance in the subsequent round.

Equation Determining Price:

price = 85 + 0.3 * Cue 1 + 0.7 * Cue 2 + e

Condition 1 – piece rate

Payoffs in the piece rate

payoff = $1.70 – (.03 * forecast error).

Payoffs in team production

∑ ($1.70 − (.03 * forecast error))

Forecast Error a Forecast Error Forecast Error

Overall 24.13 25.64 22.63

Piece Rate 23.6 25.34 21.86

Team Production 24.67 25.94 23.40

Men 23.26 24.28 22.24

Women 25.42 27.64 23.20

Men Piece Rate 25.21 27.09 23.34

Team c 0.145 -3.61 6.21*

Practice Averaged 0.156* 0.104* 0.237* 0.149* 0.163***

Lagged Error e 0.156* 0.225* 0.008 0.245*** 0.028

Round f -0.253 -0.205 -0.364 -0.231 -0.277*

Practice Averaged 0.105* 0.240* 0.110* 0.250*