A Laboratory Evaluation of Six Electronic Voting Machines

A Laboratory Evaluation of Six
Electronic Voting Machines
Fred Conrad
University of Michigan
Multi-institution, Multi-disciplinary
Project
University of Michigan University of Maryland
Frederick Conrad Paul Herrnson
Emilia Peytcheva Ben Bedersen
Michael Traugott
Georgetown University University of Rochester

Michael Hanmer Richard Niemi
Agenda
 The problem:
 Usability can affect election outcomes!
 Method:
 Anything unique about what we did?
 Some results:
 Satisfaction
 Performance
 Implications
Acknowledgements
 Wil Dijkstra, Ralph Franklin, Brian Lewis, Esther
Park, Roma Sharma, Dale Vieriegge
 National Science Foundation:
 Grant IIS-0306698
 Survey Research Center
Institute for Social Research, University of Michigan
 Partners:
 Federal Election Commission (FEC), Maryland State Board of
Elections, National Institute of Standards and Technology
(NIST)
 Vendors:
 Diebold, Hart InterCivic, ES&S, NEDAP, Avante
 Note: Sequoia declined invitation to participate
Scope and limits of current work
 Today’s talk presents a small scale study that was
designed to demonstrate potential challenges and
inform future work
 It does not address system accuracy,
affordability, accessibility, durability or ballot
design
 The voting systems tested were those available
when the study was conducted; some machines
may have been deployed with different options;
some machines may since have been updated
Voter intent and e-voting
 Hanging chads in Florida 2000 came to symbolize
ambiguity about voter intent
 E-voting (e.g. touch screen user interfaces) can

eliminate this kind of ambiguity
 With e-voting, no uncertainty about whether vote is
recorded
 Though whether or not voter pressed a button on a
touch screen can be ambiguous
 E-voting may introduce usability problems that

threaten credibility of voting tallies
Usability ≠ Security
 Much of the e-voting controversy surrounds
security
 Are the systems vulnerable to systematic, widespread
fraud?
 We propose that at least as serious a threat to

integrity of elections is usability
 Are voters ever unable to enact their intentions because
of how the user interface is designed?
 Are they ever discouraged by the experience?
 Procuring e-voting systems may depend on

usability, security and cost, among other criteria
Usability is only one characteristic
of overall performance
 Our focus on usability is not intended to
suggest that other dimensions of system
performance are not important
 We are simply focusing on usability
 Accuracy, Accessibility*, Affordability,
Durability, Security, Transportability
*we did not test with disabled users

Some Hypotheses
 Voters will make more errors
 If they have limited computer experience
 unfamiliar with interface and input conventions:
scroll bars, check boxes, focus of attention, keyboard
 For some voting tasks than others
 e.g. writing-in votes, changing votes
 Voters will be less satisfied

 the more effort required to vote
 e.g. more actions like touching the touch screen
Current Project
 Examines usability of 6 e-voting systems
 5 commercial products (used in 2004)
 1 research prototype

Field (n ≈1500 ) and laboratory (n= 42)
 Breadth vs. depth
 Focus today on laboratory study

The machines
 Selected to represent specific features
 Vendors (with exception of NEDAP)

implemented ballots for best presentation
 Photos that follow taken by our research

group – not provided by vendors
Avante Vote Trakker
Image removed to reduce size of file; contact

author for complete presentation
Diebold AccuVote TS

ES&S Optical Scan

Hart InterCivic eSlate

NEDAP LibertyVote

UMD Zoomable System
www.cs.umd.edu/~bederson/voting

General approach (lab and field)
 Before voting, users indicate intentions by
circling choices in each contest
 In some contests, instructed how to vote
 All users asked to vote on all 6 machines

 with one of two ballot designs:
 Office Block
 Straight Party option
 in 1 of 6 random orders (Latin Square)
General approach (cont’d)
 Tasks:
 change a vote
 write-in a vote
 abstain (undervote) in one contest
 two contests required voting for 2 candidates
 Users complete satisfaction questionnaire

after each machine
Lab Study Design
Computer Experience
Ballot Design Low High*
Office Block 21 9
Straight Party 10 2
n = number voters
* > twice a week

Lab Study: Design and Procedure
 42 people recruited via newspaper ads
 31 with limited computer experience
 29 over 50 years old
Why did we oversample older users
with little computer experience?
 Because e-voting systems must be usable
by anyone who wants to vote
 If anyone is unable to enact their

intentions because of the user interface,
the technology is failing
 We wanted to focus, in our small sample,

on those people most likely have problems
More about users
 Visited lab in Ann Arbor, MI in July and August,
2004
 paid $50 for 2 hours
 Previously voted in an election
 95% reported voting previously

 7% reporting using touch screens when they voted
 Prior voting experience

 Paper: 43%
 Punch card: 69%
 Lever machine: 48%
 Dials and Knobs: 19%
 Touch screen: 7%
Design and Procedure (cont’d)
 All machines in a single large room
 2 video cameras on rolling tripod
 1 per 3 machines
 Proprietary designs ruled out use of direct screen
capture e.g. scan converter or Morae
Satisfaction Results
 Preview:
 Left-most bar (Diebold)
 Right-most bar (Hart)
 Consistent with data from field study (n ≈ 1500)
 Provides face validity for lab results with small sample
“The voting system was easy to use”
7
(1= Strongly Disagree, 7 = Strongly Agree)
5
Agreement
1
Diebold ES & S Zoomable Avante NEDAP Hart
Machine
“I felt comfortable using the system”
7
5
Agreement
1
Machine
“Correcting my mistakes was easy”
7
5
Agreement
1
Machine
“Casting a write-in vote was easy to do”
7
5
Agreement
1
Machine
“Changing a vote was easy to do”
7
5
Agreement
1
Machine
Why the differences in satisfaction?
 We believe the answer lies in the details of
the interaction
 Thus, we focus on subset of voters using

these two machines:
 Office block ballot
 Limited computer experience
 n = 21
 Represents 20% of (what we project will be) ≈
13,000 codable behaviors
Focus on subgroup of users
Computer Experience
Ballot Design Low High
Office Block 21 9
Straight Party 10 2
n = number voters
Coding the Video

Coding the Video (2)

Sequential analysis
 Goal is to identify and count event
patterns
 Order is critical because each event provides
context for events that follow and precede it
 E.g. trouble changing votes when original vote
must be deselected:
 How many times did voters press new candidate
without first deselecting?
 How often did they do this before consulting Help?
 How often did they do this after consulting Help?
 Tree analysis example

Number of Actions
 For every touch screen action there are
two actions with rotary wheel
 Touch screen: press screen with finger
 Rotary wheel: move wheel and press “Enter”
 Empirically, people take proportionally

more actions
 Diebold: 1.89 actions per task
 Hart: 3.92 actions per task
Number of Actions
14 Diebold
Hart
12
Write-in
Number of Actions
10
Getting started Change vote
8
0
s t
es id en nato
r p or e
R e er n Stat . G e
n . ud .
en Re
p is. eri ff ur
t
tic
e
st
's e e
d g udg o ar oar st.
d d 1
t.
2
t.
3
t.
4
iew
c S e A S e m m h Co Jus . Ju . Ju B e es u es u es Rev
Ac res S e U o v o f t
A St
t a t a t e a t o
C n ty
S e f. oc t te
J t . y B u u
P US
G c. St St ty re
m C
Ch Ass ffic roba it D
is ar Q Q Q Q
Se u
un Co Sup Ct. . s L ibr
o t r a P r
C f . C T an b e
ko up p. Tr m
ler te S e Su Me
C a
St Stat
Voting Task
Duration
 Voting duration (mins) varied substantially
by machine
 Diebold: 4.68 (sd = 1.27)

 Hart: 10.56 (sd = 4.53)
 Presumably due to larger number of

actions in Hart than Diebold
 And possibly more thorough ballot review
Accuracy
 Varies by Machine and Voting Task:
 2 Candidates (State Representative)
 Inaccurate enough for concern
 Errors of Omission: just voted for one candidate
 Write-In (Member Library Board)

 Quite inaccurate for Hart
 Errors of commission: name spelled wrong
 Errors of omission: no write-in vote (in the end)
 Changing Vote (Probate Court Judge):

 Overall accurate but slightly less accurate for Diebold
 Error of commission: unintended candidate remains selected
Voting Accuracy
Diebold
Hart Change vote Write-in
2 Candidates
1
Proportion Correct
0.9
0.8
0.7
0.6
0.5
t r or n . u d en . . f t e s
en to p e
Re e rn Stat . G e ep mis e ri f o ur stic ust' ge ge rd rd .1 .2 t.
3
t.
4
s id e na S v t t e A
e S te R m S h C u J J ud J ud Bo a Boa es t es t es u es
r e S U Go of t
A S t a a t t a o
C nt y e . J c. t . e t. y u u u
P US c. St S ty re
m f o C
Ch Ass ffic roba it D
t is ar Q Q Q Q
Se u br
o un Co Sup Ct. t . r a P ns r Li
C f . C T a be
ko up p. Tr
le r te S e Su M em
C a
St Stat
Voting Task
Number of Actions: Getting Started

8 actions minimally required to access system: 4 selections and 4 “Enter” presses

Number of Actions: Getting Started

2 actions required to access system: Insert access card and press “Next”
Access examples
 Hart
 Voter is not able to select digits with rotary
wheel, attempts to press (non-touch) screen,
requests help
 Help does not help
 Voter figures it out
 Diebold
 Voter slides access card into reader
 Presses “Next”
Number of Actions: Vote Change
 Diebold requires “de-selecting” current
vote in order to change it
 Clicking on already checked check box
 Likely to be opaque to non-computer users
 Despite manufacturer-provided instructions
 On only 11/21 occasions, voters correctly

deselect on first try
 On 10/21 touched second candidate
without first deselecting original selection
Number of Actions: Vote Change
 Changing votes is essential for correcting
errors and expressing change of heart
 Example of problem changing vote:

 Voter 27
Number of Actions: Write-in
 Write-in votes generally involve as many
actions as letters in the name
 Double this if navigation and selection required
 Example of problems correcting write-in

mistakes:
 Voter 38
Review
 Both machines offer similar ballot review:
 Displays voters’ choices and highlights
unselected contests
 In both cases, ballot review spans two

pages
Review: Hart

Review: Diebold

How often do voters review their
votes?
 On how many occasions did voters cast
ballot without reviewing all choices
(displaying the second review page)?
 Hart: 8/34
 Diebold: 17/29
 Diebold review much briefer than Hart

suggesting cursory review
 Hart: 55.5 seconds
 Diebold: 9.8 seconds
Review Example 1
 Diebold:
 Voter (seems to accidentally) not vote in one
contest, resulting in an undervote
 Completes ballot and system displays review
screen
 She immediately presses “Cast Ballot” and
says “That one I felt confident in … didn’t even
need to go over it”
Review Example 2
 Hart
 Voter (seems to accidentally) not vote in two
contests, resulting in two undervotes
 Completes ballot and system displays first of
two review screens
 He selects first undervote (in red text) and
system displays relevant contest in ballot
 He selects intended candidates, i.e. votes for
circled candidates in voter info booklet, and
system displays first review screen
 He repeats for second undervote
Review screens
 Some designs promote more review and
correction of errors than others
 Hart review screens visually distinct from ballot screens
and, if voter presses “Cast Vote” after first review
screen, system displays second screen
 Diebold review screens hard to distinguish from ballot
screens and if voter presses “Cast Ballot” without
scrolling to see lower part of screen, system casts ballot
 More review and correction surely improves

voting accuracy
 but involves more work which may lead to lower
satisfaction
Summary
 User satisfaction and performance related
to particular features
 Touch screen involves fewer actions and
seemed more intuitive to these users than
wheel-plus-enter sequence
 Deselecting a choice in order to change it
seemed counterintuitive to many voters and
responsible for at least one incident of casting
an unintended vote
 Review screens designed to promote review
(distinct from ballot, hard to cast vote in
middle) led to more review and correction
Summary (cont’d)
 These users were more successful on
some tasks with Hart and on others with
Diebold
 Fit between features and tasks more
appropriate level of analysis than overall
machine
Conclusions
 In a situation designed to maximize usability
problems, the machines mostly fared well
 But they did exhibit some usability problems and

accuracy was not perfect
 Both unintended votes and no votes
 Substantial proportion voters did not review their ballots
 Seems likely that non-computer users will not
recognize interface conventions:
 E.g. De-selection and scrolling
 Even very low error rates -- for just computer

novices -- can matter in very close elections
Conclusions (con’t)
 We cannot compare voters’ performance
with new technology to older techniques
 But we will be able to use performance
with the ES&S (paper ballot, optical scan)
as a rough baseline
 Certainly, voting systems are now being
securitized in a way they were not before
Implications
 Most of these design problems can be
improved by applying usability engineering
techniques
 But industry and election officials need to
make this a priority
 EAC/NIST developing usability guidelines
 Unparalleled design challenge:

 Systems should be usable by all citizens all the
time, even if used once every few years
Thank you!
Additional Slides if time permits
 User Interface Can Affect Outcome
 Variance
 Bias
 Some usability measures
 Measures (cont’d)
User Interface Can Affect Outcome
 Ballot Design
 Butterfly ballot
 Interaction
 Casting ballot too soon
 Changing votes
 Writing-in votes
 Navigating between contests
 Reviewing votes
 Frustration, Increased Cynicism
 Abandonment
 Lower Turnout in Future
 Voters might question results
Variance
 Interface-related error is not systematic
 all candidates should suffer equally from this
(all else being equal)
 E.g. if difficult to change votes, doesn’t matter
which selections require change
 But unlikely that error for different

candidates is exactly complementary
Bias
 Interface systematically prevents votes
from being cast for a particular candidate
 Results either in no vote being cast or

voter choosing unintended candidate
 e.g. Butterfly Ballot may have led Jewish
voters who intended to vote for Al Gore to vote
for Pat Buchanan
Some usability measures
 Satisfaction
 Accuracy
 Do voters vote for whom they intend?
 In lab, compare circled choices to observable
screen actions
 In field, compare circled choices to ballot
images and audit trails
Measures (con’t)
 Number of Actions
 Presses and clicks
 Substantive actions, e.g. requests for system
help, revisions of earlier selections
 Duration
 Per task
 Overall

A Laboratory Evaluation of Six Electronic Voting Machines

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Laboratory Evaluation of Six Electronic Voting Machines

Uploaded by

Copyright:

Available Formats

A Laboratory Evaluation of Six

Electronic Voting Machines

Georgetown University University of Rochester

 E-voting (e.g. touch screen user interfaces) can

 E-voting may introduce usability problems that

 We propose that at least as serious a threat to

 Procuring e-voting systems may depend on

*we did not test with disabled users

 Voters will be less satisfied

 Focus today on laboratory study

 Vendors (with exception of NEDAP)

 Photos that follow taken by our research

Image removed to reduce size of file; contact

Image removed to reduce size of file; contact

Image removed to reduce size of file; contact

Image removed to reduce size of file; contact

Image removed to reduce size of file; contact

Image removed to reduce size of file; contact

 All users asked to vote on all 6 machines

 Users complete satisfaction questionnaire

Ballot Design Low High*

* > twice a week

 If anyone is unable to enact their

 We wanted to focus, in our small sample,

 Previously voted in an election

 95% reported voting previously

 Prior voting experience

 Thus, we focus on subset of voters using

Ballot Design Low High

Image removed to reduce size of file; contact

Image removed to reduce size of file; contact

 How often did they do this after consulting Help?

 Tree analysis example

 Empirically, people take proportionally

 Diebold: 4.68 (sd = 1.27)

 Presumably due to larger number of

 Write-In (Member Library Board)

 Changing Vote (Probate Court Judge):

Image removed to reduce size of file; contact

8 actions minimally required to access system: 4 selections and 4 “Enter” presses

Image removed to reduce size of file; contact

 On only 11/21 occasions, voters correctly

 Example of problem changing vote:

 Example of problems correcting write-in

 In both cases, ballot review spans two

Image removed to reduce size of file; contact

Image removed to reduce size of file; contact

 Diebold review much briefer than Hart

 More review and correction surely improves

 But they did exhibit some usability problems and

 Even very low error rates -- for just computer

 Unparalleled design challenge:

 But unlikely that error for different

 Results either in no vote being cast or

You might also like