You are on page 1of 65

A Laboratory Evaluation of Six

Electronic Voting Machines

Fred Conrad
University of Michigan
Multi-institution, Multi-disciplinary
Project
University of Michigan University of Maryland
Frederick Conrad Paul Herrnson
Emilia Peytcheva Ben Bedersen
Michael Traugott

Georgetown University University of Rochester


Michael Hanmer Richard Niemi
Agenda
 The problem:
 Usability can affect election outcomes!
 Method:
 Anything unique about what we did?
 Some results:
 Satisfaction
 Performance

 Implications
Acknowledgements
 Wil Dijkstra, Ralph Franklin, Brian Lewis, Esther
Park, Roma Sharma, Dale Vieriegge
 National Science Foundation:
 Grant IIS-0306698
 Survey Research Center
Institute for Social Research, University of Michigan
 Partners:
 Federal Election Commission (FEC), Maryland State Board of
Elections, National Institute of Standards and Technology
(NIST)

 Vendors:
 Diebold, Hart InterCivic, ES&S, NEDAP, Avante
 Note: Sequoia declined invitation to participate
Scope and limits of current work
 Today’s talk presents a small scale study that was
designed to demonstrate potential challenges and
inform future work
 It does not address system accuracy,
affordability, accessibility, durability or ballot
design
 The voting systems tested were those available
when the study was conducted; some machines
may have been deployed with different options;
some machines may since have been updated
Voter intent and e-voting
 Hanging chads in Florida 2000 came to symbolize
ambiguity about voter intent

 E-voting (e.g. touch screen user interfaces) can


eliminate this kind of ambiguity
 With e-voting, no uncertainty about whether vote is
recorded
 Though whether or not voter pressed a button on a
touch screen can be ambiguous

 E-voting may introduce usability problems that


threaten credibility of voting tallies
Usability ≠ Security
 Much of the e-voting controversy surrounds
security
 Are the systems vulnerable to systematic, widespread
fraud?

 We propose that at least as serious a threat to


integrity of elections is usability
 Are voters ever unable to enact their intentions because
of how the user interface is designed?
 Are they ever discouraged by the experience?

 Procuring e-voting systems may depend on


usability, security and cost, among other criteria
Usability is only one characteristic
of overall performance
 Our focus on usability is not intended to
suggest that other dimensions of system
performance are not important
 We are simply focusing on usability
 Accuracy, Accessibility*, Affordability,
Durability, Security, Transportability

*we did not test with disabled users


Some Hypotheses
 Voters will make more errors
 If they have limited computer experience
 unfamiliar with interface and input conventions:
scroll bars, check boxes, focus of attention, keyboard
 For some voting tasks than others
 e.g. writing-in votes, changing votes

 Voters will be less satisfied


 the more effort required to vote
 e.g. more actions like touching the touch screen
Current Project
 Examines usability of 6 e-voting systems
 5 commercial products (used in 2004)
 1 research prototype


Field (n ≈1500 ) and laboratory (n= 42)
 Breadth vs. depth

 Focus today on laboratory study


The machines
 Selected to represent specific features

 Vendors (with exception of NEDAP)


implemented ballots for best presentation

 Photos that follow taken by our research


group – not provided by vendors
Avante Vote Trakker

Image removed to reduce size of file; contact


author for complete presentation
Diebold AccuVote TS

Image removed to reduce size of file; contact


author for complete presentation
ES&S Optical Scan

Image removed to reduce size of file; contact


author for complete presentation
Hart InterCivic eSlate

Image removed to reduce size of file; contact


author for complete presentation
NEDAP LibertyVote

Image removed to reduce size of file; contact


author for complete presentation
UMD Zoomable System
www.cs.umd.edu/~bederson/voting

Image removed to reduce size of file; contact


author for complete presentation
General approach (lab and field)
 Before voting, users indicate intentions by
circling choices in each contest
 In some contests, instructed how to vote

 All users asked to vote on all 6 machines


 with one of two ballot designs:
 Office Block
 Straight Party option
 in 1 of 6 random orders (Latin Square)
General approach (cont’d)
 Tasks:
 change a vote
 write-in a vote
 abstain (undervote) in one contest
 two contests required voting for 2 candidates

 Users complete satisfaction questionnaire


after each machine
Lab Study Design
Computer Experience

Ballot Design Low High*

Office Block 21 9

Straight Party 10 2

n = number voters

* > twice a week


Lab Study: Design and Procedure
 42 people recruited via newspaper ads
 31 with limited computer experience
 29 over 50 years old
Why did we oversample older users
with little computer experience?
 Because e-voting systems must be usable
by anyone who wants to vote

 If anyone is unable to enact their


intentions because of the user interface,
the technology is failing

 We wanted to focus, in our small sample,


on those people most likely have problems
More about users
 Visited lab in Ann Arbor, MI in July and August,
2004
 paid $50 for 2 hours

 Previously voted in an election

 95% reported voting previously


 7% reporting using touch screens when they voted

 Prior voting experience


 Paper: 43%
 Punch card: 69%
 Lever machine: 48%
 Dials and Knobs: 19%
 Touch screen: 7%
Design and Procedure (cont’d)
 All machines in a single large room
 2 video cameras on rolling tripod
 1 per 3 machines
 Proprietary designs ruled out use of direct screen
capture e.g. scan converter or Morae
Satisfaction Results
 Preview:
 Left-most bar (Diebold)
 Right-most bar (Hart)
 Consistent with data from field study (n ≈ 1500)
 Provides face validity for lab results with small sample
“The voting system was easy to use”
7
(1= Strongly Disagree, 7 = Strongly Agree)

5
Agreement

1
Diebold ES & S Zoomable Avante NEDAP Hart

Machine
“I felt comfortable using the system”
7
(1= Strongly Disagree, 7 = Strongly Agree)

5
Agreement

1
Diebold ES & S Zoomable Avante NEDAP Hart

Machine
“Correcting my mistakes was easy”
7
(1= Strongly Disagree, 7 = Strongly Agree)

5
Agreement

1
Diebold ES & S Zoomable Avante NEDAP Hart

Machine
“Casting a write-in vote was easy to do”
7
(1= Strongly Disagree, 7 = Strongly Agree)

5
Agreement

1
Diebold ES & S Zoomable Avante NEDAP Hart

Machine
“Changing a vote was easy to do”
7
(1= Strongly Disagree, 7 = Strongly Agree)

5
Agreement

1
Diebold ES & S Zoomable Avante NEDAP Hart

Machine
Why the differences in satisfaction?
 We believe the answer lies in the details of
the interaction

 Thus, we focus on subset of voters using


these two machines:
 Office block ballot
 Limited computer experience
 n = 21
 Represents 20% of (what we project will be) ≈
13,000 codable behaviors
Focus on subgroup of users
Computer Experience

Ballot Design Low High

Office Block 21 9

Straight Party 10 2

n = number voters
Coding the Video

Image removed to reduce size of file; contact


author for complete presentation
Coding the Video (2)

Image removed to reduce size of file; contact


author for complete presentation
Sequential analysis
 Goal is to identify and count event
patterns
 Order is critical because each event provides
context for events that follow and precede it
 E.g. trouble changing votes when original vote
must be deselected:
 How many times did voters press new candidate
without first deselecting?
 How often did they do this before consulting Help?

 How often did they do this after consulting Help?

 Tree analysis example


Number of Actions
 For every touch screen action there are
two actions with rotary wheel
 Touch screen: press screen with finger
 Rotary wheel: move wheel and press “Enter”

 Empirically, people take proportionally


more actions
 Diebold: 1.89 actions per task
 Hart: 3.92 actions per task
Number of Actions
14 Diebold
Hart
12

Write-in
Number of Actions

10
Getting started Change vote
8

0
s t
es id en nato
r p or e
R e er n Stat . G e
n . ud .
en Re
p is. eri ff ur
t
tic
e
st
's e e
d g udg o ar oar st.
d d 1
t.
2
t.
3
t.
4
iew
c S e A S e m m h Co Jus . Ju . Ju B e es u es u es Rev
Ac res S e U o v o f t
A St
t a t a t e a t o
C n ty
S e f. oc t te
J t . y B u u
P US
G c. St St ty re
m C
Ch Ass ffic roba it D
is ar Q Q Q Q
Se u
un Co Sup Ct. . s L ibr
o t r a P r
C f . C T an b e
ko up p. Tr m
ler te S e Su Me
C a
St Stat

Voting Task
Duration
 Voting duration (mins) varied substantially
by machine

 Diebold: 4.68 (sd = 1.27)


 Hart: 10.56 (sd = 4.53)

 Presumably due to larger number of


actions in Hart than Diebold
 And possibly more thorough ballot review
Accuracy
 Varies by Machine and Voting Task:
 2 Candidates (State Representative)
 Inaccurate enough for concern
 Errors of Omission: just voted for one candidate

 Write-In (Member Library Board)


 Quite inaccurate for Hart
 Errors of commission: name spelled wrong
 Errors of omission: no write-in vote (in the end)

 Changing Vote (Probate Court Judge):


 Overall accurate but slightly less accurate for Diebold
 Error of commission: unintended candidate remains selected
Voting Accuracy
Diebold
Hart Change vote Write-in
2 Candidates
1
Proportion Correct

0.9

0.8

0.7

0.6

0.5
t r or n . u d en . . f t e s
en to p e
Re e rn Stat . G e ep mis e ri f o ur stic ust' ge ge rd rd .1 .2 t.
3
t.
4
s id e na S v t t e A
e S te R m S h C u J J ud J ud Bo a Boa es t es t es u es
r e S U Go of t
A S t a a t t a o
C nt y e . J c. t . e t. y u u u
P US c. St S ty re
m f o C
Ch Ass ffic roba it D
t is ar Q Q Q Q
Se u br
o un Co Sup Ct. t . r a P ns r Li
C f . C T a be
ko up p. Tr
le r te S e Su M em
C a
St Stat

Voting Task
Number of Actions: Getting Started

Image removed to reduce size of file; contact


author for complete presentation

8 actions minimally required to access system: 4 selections and 4 “Enter” presses


Number of Actions: Getting Started

Image removed to reduce size of file; contact


author for complete presentation

2 actions required to access system: Insert access card and press “Next”
Access examples
 Hart
 Voter is not able to select digits with rotary
wheel, attempts to press (non-touch) screen,
requests help
 Help does not help
 Voter figures it out
 Diebold
 Voter slides access card into reader
 Presses “Next”
Number of Actions: Vote Change
 Diebold requires “de-selecting” current
vote in order to change it
 Clicking on already checked check box
 Likely to be opaque to non-computer users
 Despite manufacturer-provided instructions

 On only 11/21 occasions, voters correctly


deselect on first try
 On 10/21 touched second candidate
without first deselecting original selection
Number of Actions: Vote Change
 Changing votes is essential for correcting
errors and expressing change of heart

 Example of problem changing vote:


 Voter 27
Number of Actions: Write-in
 Write-in votes generally involve as many
actions as letters in the name
 Double this if navigation and selection required

 Example of problems correcting write-in


mistakes:
 Voter 38
Review
 Both machines offer similar ballot review:
 Displays voters’ choices and highlights
unselected contests

 In both cases, ballot review spans two


pages
Review: Hart

Image removed to reduce size of file; contact


author for complete presentation
Review: Diebold

Image removed to reduce size of file; contact


author for complete presentation
How often do voters review their
votes?
 On how many occasions did voters cast
ballot without reviewing all choices
(displaying the second review page)?
 Hart: 8/34
 Diebold: 17/29

 Diebold review much briefer than Hart


suggesting cursory review
 Hart: 55.5 seconds
 Diebold: 9.8 seconds
Review Example 1
 Diebold:
 Voter (seems to accidentally) not vote in one
contest, resulting in an undervote
 Completes ballot and system displays review
screen
 She immediately presses “Cast Ballot” and
says “That one I felt confident in … didn’t even
need to go over it”
Review Example 2
 Hart
 Voter (seems to accidentally) not vote in two
contests, resulting in two undervotes
 Completes ballot and system displays first of
two review screens
 He selects first undervote (in red text) and
system displays relevant contest in ballot
 He selects intended candidates, i.e. votes for
circled candidates in voter info booklet, and
system displays first review screen
 He repeats for second undervote
Review screens
 Some designs promote more review and
correction of errors than others
 Hart review screens visually distinct from ballot screens
and, if voter presses “Cast Vote” after first review
screen, system displays second screen
 Diebold review screens hard to distinguish from ballot
screens and if voter presses “Cast Ballot” without
scrolling to see lower part of screen, system casts ballot

 More review and correction surely improves


voting accuracy
 but involves more work which may lead to lower
satisfaction
Summary
 User satisfaction and performance related
to particular features
 Touch screen involves fewer actions and
seemed more intuitive to these users than
wheel-plus-enter sequence
 Deselecting a choice in order to change it
seemed counterintuitive to many voters and
responsible for at least one incident of casting
an unintended vote
 Review screens designed to promote review
(distinct from ballot, hard to cast vote in
middle) led to more review and correction
Summary (cont’d)
 These users were more successful on
some tasks with Hart and on others with
Diebold
 Fit between features and tasks more
appropriate level of analysis than overall
machine
Conclusions
 In a situation designed to maximize usability
problems, the machines mostly fared well

 But they did exhibit some usability problems and


accuracy was not perfect
 Both unintended votes and no votes
 Substantial proportion voters did not review their ballots
 Seems likely that non-computer users will not
recognize interface conventions:
 E.g. De-selection and scrolling

 Even very low error rates -- for just computer


novices -- can matter in very close elections
Conclusions (con’t)
 We cannot compare voters’ performance
with new technology to older techniques
 But we will be able to use performance
with the ES&S (paper ballot, optical scan)
as a rough baseline
 Certainly, voting systems are now being
securitized in a way they were not before
Implications
 Most of these design problems can be
improved by applying usability engineering
techniques
 But industry and election officials need to
make this a priority
 EAC/NIST developing usability guidelines

 Unparalleled design challenge:


 Systems should be usable by all citizens all the
time, even if used once every few years
Thank you!
Additional Slides if time permits
 User Interface Can Affect Outcome
 Variance
 Bias
 Some usability measures
 Measures (cont’d)
User Interface Can Affect Outcome
 Ballot Design
 Butterfly ballot

 Interaction
 Casting ballot too soon
 Changing votes
 Writing-in votes
 Navigating between contests
 Reviewing votes
 Frustration, Increased Cynicism
 Abandonment
 Lower Turnout in Future
 Voters might question results
Variance
 Interface-related error is not systematic
 all candidates should suffer equally from this
(all else being equal)
 E.g. if difficult to change votes, doesn’t matter
which selections require change

 But unlikely that error for different


candidates is exactly complementary
Bias
 Interface systematically prevents votes
from being cast for a particular candidate

 Results either in no vote being cast or


voter choosing unintended candidate
 e.g. Butterfly Ballot may have led Jewish
voters who intended to vote for Al Gore to vote
for Pat Buchanan
Some usability measures
 Satisfaction

 Accuracy
 Do voters vote for whom they intend?
 In lab, compare circled choices to observable
screen actions
 In field, compare circled choices to ballot
images and audit trails
Measures (con’t)
 Number of Actions
 Presses and clicks
 Substantive actions, e.g. requests for system
help, revisions of earlier selections

 Duration
 Per task
 Overall

You might also like