You are on page 1of 115

HELSINKI UNIVERSITY OF TECHNOLOGY

Department of Computer Science and Engineering


Software Business and Engineering Institute

Juha Rantanen

Acceptance Test-Driven Development with Keyword-Driven


Test Automation Framework in an Agile Software Project

Master’s Thesis

Espoo, May 18, 2007

Supervisor: Professor Tomi Männistö


Instructor: Harri Töhönen, M.Sc.
HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF MASTER’S THESIS
Department of Computer Science and Engineering

Author Date
Juha Rantanen May 18, 2007
Pages
102
Title of thesis
Acceptance Test-Driven Development with Keyword-Driven Test Automation
Framework in an Agile Software Project
Professorship Professorship Code
Computer Science T-76
Supervisor
Professor Tomi Männistö
Instructor
Harri Töhönen, M.Sc.

Agile software development uses iterative development allowing changes and updates periodically
to the software requirements. In agile software development methods, customer-defined tests have
an important role in assuring that the software fulfills the customer’s needs. These tests can be de-
fined before implementation to establish a clear goal for the development team. This is called accep-
tance test-driven development (ATDD).

With ATDD the acceptance tests are usually automated. Keyword-driven testing is the latest evolu-
tion in test automation approaches. In keyword-driven testing, instructions, inputs, and expected
outputs are defined in separate test data. A test automation framework tests the software accordingly
and reports the results.

In this thesis, the use of acceptance test-driven development with the keyword-driven test automation
framework is studied in a real-world agile software development project. The study was conducted
using action research during a four-month period. The main methods used were observations and
interviews.

It was noticed that the keyword-driven test automation framework can be used in acceptance test-
driven development. However, there were some limitations preventing the implementation of all the
test cases before the software implementation started. It was also noticed that the test automation
framework used to implement the acceptance test cases is not in a crucial role in acceptance test
driven development. The biggest benefits were gained from the detailed planning done before the
software implementation at the beginning of the iterations.

Based on the results, acceptance test-driven development improves communication and cooperation,
and gives a common understanding about the details of the software’s features. These improvements
help the development team to implement the wanted features. Therefore, the risk of building incom-
plete software decreases. The improvements also help to implement the features more efficiently as
the features are more likely to be implemented correctly at the first time. Also remarkable changes to
the test engineers’ role were noticed as the test engineers are more involved in the detailed planning.
It seems that the biggest challenge in acceptance test-driven development is creating tests on right
test levels and in a right scope.
Keywords: acceptance test-driven development, keyword-driven testing, agile testing, test auto-
mation

ii
TEKNILLINEN KORKEAKOULU DIPLOMITYÖN TIIVISTELMÄ
Tietotekniikan osasto

Tekijä Päiväys
Juha Rantanen May 18, 2007
Sivumäärä
102
Työn nimi
Hyväksymistestauslähtöinen kehitys avainsanaohjatulla testiautomaatiokehyksellä
ketterässä ohjelmistoprojektissa
Professuuri Koodi
Ohjelmistoliiketoiminta ja tuotanto T-76
Työn valvoja
Professori Tomi Männistö
Työn ohjaaja
DI Harri Töhönen

Ketterä ohjelmistokehitys pohjautuu iteratiiviseen lähestymistapaan. Iteratiivisuus mahdollistaa oh-


jelmiston vaatimusten muuttamisen ja päivittämisen jaksottaisesti. Ketterissä ohjelmistokehityspro-
sesseissa asiakkaan määrittämät testit ovat tärkeässä roolissa varmistettaessa, että kehitettävä ohjel-
misto täyttää asiakkaan tarpeet. Nämä testit voidaan määritellä ennen toteutuksen aloittamista selke-
än tavoitteen luomiseksi kehitystiimille. Tätä kutsutaan hyväksymistestauslähtöiseksi kehitykseksi.

Hyväksymistestauslähtöisessä kehityksessä hyväksymistestit usein automatisoidaan. Yksi uusimmis-


ta testiautomaatiomenetelmistä on avainsanaohjattu testaus. Avainsanaohjatussa testauksessa ohjeet,
syötteet ja oletetut lopputulokset määritellään erillisissä testitiedoissa. Testiautomaatiokehys testaa
ohjelmistoa kyseisten tietojen mukaisesti ja raportoi tulokset.

Tässä diplomityössä tarkastellaan avainsanaohjatun testiautomaatiokehyksen käyttöä hyväksymistes-


tauslähtöisessä kehityksessä. Tutkimuksen kohteena oli eräs käynnissä oleva ketterä ohjelmistotuo-
tantoprojekti. Lähestymistapana käytettiin toimintatutkimusta (action research) ja pääasiallisina me-
netelminä havainnointia ja haastatteluita. Tutkimusjakson pituus oli neljä kuukautta.

Tutkimuksessa havaittiin, että avainsanaohjattua testiautomaatiokehystä voidaan käyttää hyväksy-


mislähtöisessä kehityksessä. Jotkin rajoitteet kuitenkin estivät testien tekemisen ennen ohjelmiston
toteutuksen aloittamista. Lisäksi havaittiin, että hyväksymistestauslähtöisessä kehityksessä testita-
pausten luomisessa käytettävällä testiautomaatiokehyksellä ei ole ratkaisevaa roolia. Suurimmat
hyödyt saavutettiin yksityiskohtaisella suunnittelulla ennen ohjelmiston toteuttamista jokaisen ite-
raation alussa.

Tulosten perusteella hyväksymistestauslähtöinen kehitys edistää eri osapuolten välistä kommunikaa-


tiota, yhteistyötä ja käsitystä ohjelmiston ominaisuuksien yksityiskohdista. Tämä edistää haluttujen
ominaisuuksien toteuttamista. Näin ollen riski toimimattoman tai väärin toimivan ohjelmiston val-
mistamisesta pienenee. Tämä edesauttaa tehokkaampaa ohjelmistokehitystä, sillä oikeat ominaisuu-
det tuotetaan todennäköisemmin jo ensimmäisellä toteutuskerralla. Testaajien roolissa huomattiin
myös merkittäviä muutoksia johtuen testaajien lisääntyneestä osallistumisesta yksityiskohtaiseen
suunnitteluun. Näyttää siltä, että hyväksymistestauslähtöisen kehityksen suurimmat haasteet liittyvät
oikeilla testitasoilla ja oikeassa laajuudessaan olevien testien luomiseen.
Avainsanat: hyväksymislähtöinen kehitys, avainsanaohjattu testaus, ketterä testaus, testiauto-
maatio

iii
ACKNOWLEDGEMENTS
This master’s thesis has been written for a Finnish software testing consultancy company Qentinel dur-
ing the years 2006 and 2007. I would like to thank all the Qentinelians who have made this possible.
Big thanks belong to my instructor Harri Töhönen for his interest, valuable feedback, and time used for
listening and commenting my ideas.

I would express my gratitude to my supervisor Tomi Männistö who gave advice and comments when
those were needed.

I would like to thank Petri Haapio and Pekka Laukkanen whom I have been working with and who
have been giving valuable ideas, comments and feedback. The discussions with these two profession-
als have improved my know-how about agile software development and test automation. That know-
how has been priceless during this work.

I would wish to thank all the members of the project where the research was carried out. It has been
very rewarding to work with them.

Also my good friend Pauli Aho deserves to be thanked. I am deeply indebted to him for using his time
to check the language of this thesis.

Finally, special thanks go to my lovely wife Aino for the help and support I received during this pro-
ject. I am grateful to her for being so patient.

iv
TABLE OF CONTENTS
TERMS....................................................................................................................................... VII

1 INTRODUCTION ............................................................................................................. 1
1.1 Motivation .............................................................................................................. 1
1.2 Aim of the Thesis ................................................................................................... 3
1.3 Structure of the Thesis .......................................................................................... 3
2 TRADITIONAL TESTING ................................................................................................ 4
2.1 Purpose of Testing................................................................................................. 4
2.2 Dynamic and Static Testing................................................................................... 4
2.3 Functional and Non-Functional Testing................................................................. 4
2.4 White-Box and Black-Box Testing ......................................................................... 5
2.5 Test Levels ............................................................................................................ 5
3 AGILE AND ITERATIVE SOFTWARE DEVELOPMENT ............................................... 9
3.1 Iterative Development Model................................................................................. 9
3.2 Agile Development................................................................................................. 10
3.3 Scrum..................................................................................................................... 11
3.4 Extreme Programming........................................................................................... 15
3.5 Scrum and Extreme Programming Together......................................................... 17
3.6 Measuring Progress in Agile Projects ................................................................... 17
4 TESTING IN AGILE SOFTWARE DEVELOPMENT ...................................................... 19
4.1 Purpose of Testing................................................................................................. 19
4.2 Test Levels ............................................................................................................ 19
4.3 Acceptance Test-Driven Development.................................................................. 22
5 TEST AUTOMATION APPROACHES............................................................................ 28
5.1 Test Automation..................................................................................................... 28
5.2 Evolution of Test Automation Frameworks............................................................ 29
5.3 Keyword-Driven Testing ........................................................................................ 29
6 KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK .......................................... 32
6.1 Keyword-Driven Test Automation Framework....................................................... 32
6.2 Test Data ............................................................................................................... 33
6.3 Test Execution ....................................................................................................... 35
6.4 Test Reporting ....................................................................................................... 35
7 EXAMPLE OF ACCEPTANCE TEST-DRIVEN DEVELOPMENT WITH
KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK .......................................... 36
7.1 Test Data between User Stories and System under Test ..................................... 36
7.2 User Stories ........................................................................................................... 37
7.3 Defining Acceptance Tests.................................................................................... 37
7.4 Implementing Acceptance Tests and Application.................................................. 39
8 ELABORATED GOALS OF THE THESIS...................................................................... 45
8.1 Scope..................................................................................................................... 45
8.2 Research Questions .............................................................................................. 45
9 RESEARCH SUBJECT AND METHOD ......................................................................... 47
9.1 Case Project .......................................................................................................... 47
9.2 Research Method .................................................................................................. 47
9.3 Data Collection ...................................................................................................... 49

v
10 ACCEPTANCE TEST-DRIVEN DEVELOPMENT WITH KEYWORD-DRIVEN
TEST AUTOMATION FRAMEWORK IN THE PROJECT UNDER STUDY................... 51
10.1 Development Model and Development Practices Used in the Project.................. 51
10.2 January Sprint........................................................................................................ 52
10.3 February Sprint ...................................................................................................... 55
10.4 March Sprint .......................................................................................................... 61
10.5 April Sprint ............................................................................................................. 63
10.6 Interviews............................................................................................................... 65
11 ANALYSES OF OBSERVATIONS ................................................................................. 72
11.1 Suitability of the Keyword-Driven Test Automation Framework with
Acceptance Test-Driven Development.................................................................. 72
11.2 Use of the Keyword-Driven Test Automation Framework with Acceptance
Test-Driven Development...................................................................................... 76
11.3 Benefits, Challenges and Drawbacks of Acceptance Test-Driven
Development with Keyword-Driven Test Automation Framework......................... 78
11.4 Good Practices ...................................................................................................... 87
12 DISCUSSION AND CONCLUSIONS .............................................................................. 89
12.1 Researcher’s Experience ...................................................................................... 89
12.2 Main Conclusions .................................................................................................. 89
12.3 Validity ................................................................................................................... 90
12.4 Evaluation of the Thesis ........................................................................................ 92
12.5 Further Research Areas ........................................................................................ 92
BIBLIOGRAPHY........................................................................................................................ 94

APPENDIX A PRINCIPLES BEHIND THE AGILE MANIFESTO ..................................... 101

APPENDIX B INTERVIEW QUESTIONS .......................................................................... 102

vi
TERMS
Acceptance Criteria The exit criteria that a component or system must satisfy in order to
be accepted by a user, customer, or other authorized entity. (IEEE
Std 610.12-1990)
Acceptance Testing Formal testing with respect to user needs, requirements, and busi-
ness processes conducted to determine whether or not a system sat-
isfies the acceptance criteria and to enable the user, customers or
other authorized entity to determine whether or not to accept the
system. (IEEE Std 610.12-1990) See also component testing, inte-
gration testing and acceptance testing.
Acceptance Test-Driven De- A way of developing software where the acceptance test cases are
velopment (ATDD) developed, and often automated, before the software is developed to
run those test cases. See also test-driven development.
Actual Result The behavior produced/observed when a component or system is
tested. (ISTQB 2006)
Agile Testing Testing practice for a project using agile methodologies, such as
extreme programming (XP), treating development as the customer
of testing and emphasizing the test-first design paradigm. (ISTQB
2006) See also test-driven development and acceptance test-driven
development.
Base Keyword Keyword implemented in a test library of a keyword-driven test
automation framework. (Laukkanen 2006) See also sentence format
keyword and user keyword.
Behavior The response of a component or system to a set of input values and
preconditions. (ISTQB 2006)
Bespoke Software Software developed specifically for a set of users or customers. The
opposite is off-the-shelf software. (ISTQB 2006)
Beta Testing Operational testing by potential and/or existing users/customers at
an external site not otherwise involved with the developers, to de-
termine whether or not a component or system satisfies the
user/customer needs and fits within the business processes. Beta
testing is often employed as a form of external acceptance testing
for off-the-shelf software in order to acquire feedback from the mar-
ket. (ISTQB 2006)
Black-box Testing Testing, either functional or non-functional, without reference to the
internal structure of the component or system. (ISTQB 2006) See
also white-box testing.
Bug See defect.
Capture/Playback Tool A type of test execution tool where inputs are recorded during man-
ual testing in order to generate automated test scripts that can be
executed later (i.e. replayed). These tools are often used to support
automated regression testing. (ISTQB 2006)
Component A minimal software item that can be tested in isolation. (ISTQB
2006)

vii
Component Testing The testing of individual software components. (IEEE Std 610.12-
1990)
Context-Driven Testing A testing methodology that underlines the importance of the context
where different testing practices are used over the practices them-
selves. The main message is that there are good practices in a con-
text but there are no general best practices. (Kaner et al. 2001a)
Daily Build A development activity where a complete system is compiled and
linked every day (usually overnight), so that a consistent system is
available at any time including all latest changes. (ISTQB 2006)
Data-Driven Testing A scripting technique that stores test input and expected results in a
table or spreadsheet, so that a single control script can execute all of
the tests in the table. Data-driven testing is often used to support the
application of test execution tools such as capture/playback tools.
(Fewster & Graham 1999) See also keyword-driven testing.
Defect A flaw in a component or system that can cause the component or
system to fail to perform its required function, e.g. an incorrect
statement or data definition. A defect, if encountered during execu-
tion, may cause a failure of the component or system. (ISTQB
2006)
Defined Process In defined process every piece of work is well understood. With
well defined input, the defined process can be started and allowed
to run until completion, ending with the same results every time.
(Schwaber & Beedle 2002) See also empirical process.
Dynamic Testing Testing that involves the execution of the software of a component
or system. (ISTQB 2006) See also static testing.
Empirical Process In empirical process the unexpected is expected. Empirical process
provides and exercises control through frequent inspection and ad-
aptation in imperfectly defined environments where unpredictable
and unrepeatable outputs are generated. (Schwaber & Beedle 2002)
See also defined process.
Expected Outcome See expected result.
Expected Result The behavior predicted by the specification, or another source, of
the component or system under specified conditions. (ISTQB 2006)
Exploratory Testing An informal test design technique where the tester actively controls
the design of the tests as those tests are performed and uses infor-
mation gained while testing to design new and better tests. (Bach
2003b)
Fail A test is deemed to fail if its actual result does not match its ex-
pected result. (ISTQB 2006)
Failure Deviation of the component or system from its expected delivery,
service or result. (Fenton 1996)
Fault See defect.

viii
Feature An attribute of a component or system specified or implied by re-
quirements documentation (for example reliability, usability or de-
sign constraints). (IEEE Std 1008-1987)
Feature Creep On-going requirements increase without corresponding adjustment
of approved cost and schedule allowances. As some projects pro-
gress, especially through the definition and development phases,
requirements tend to change incrementally, causing the project
manager to add to the project's mission or objectives without getting
a corresponding increase in the time and budget allowances. (Wide-
man 2002)
Functional Testing Testing based on an analysis of the specification of the functionality
of a component or system. (ISTQB 2006) See also black-box test-
ing.
Functionality The capability of the software product to provide functions which
meet stated and implied needs when the software is used under
specified conditions. (ISO/IEC Std 9126-1:2001)
High Level Test Case A test case without concrete (implementation level) values for input
data and expected results. Logical operators are used; instances of
the actual values are not yet defined and/or available. (ISTQB 2006)
See also low level test case.
Input A variable (whether stored within a component or outside) that is
read by a component. (ISTQB 2006)
Input Value An instance of an input. (ISTQB 2006) See also input.
Information Radiator An information radiator is a large display of critical team informa-
tion that is continuously updated and located in a spot where the
team can see it constantly. (Agile Advice 2005)
Integration Testing Testing performed to expose defects in the interfaces and in the in-
teractions between integrated components or systems. (ISTQB
2006) See also component testing, system testing and acceptance
testing.
Iterative Development A development life cycle where a project is broken into a usually
Model large number of iterations. Iteration is a complete development loop
resulting in a release (internal or external) of an executable product,
a subset of the final product under development, which grows from
iteration to iteration to become the final product. (ISTQB 2006)
Keyword A directive representing a single action in keyword-driven testing.
(Laukkanen 2006)
Keyword-Driven Test Test automation framework using keyword-driven testing technique.
Automation Framework
Keyword-Driven Testing A scripting technique that uses data files to contain not only test
data and expected results, but also keywords related to the applica-
tion being tested. The keywords are interpreted by special support-
ing scripts that are called by the control script for the test. (ISTQB
2006) See also data-driven testing.

ix
Low Level Test Case A test case with concrete (implementation level) values for input
data and expected results. Logical operators from high level test
cases are replaced by actual values that correspond to the objectives
of the logical operators. (ISTQB 2006) See also high level test case.
Negative Testing Tests aimed at showing that a component or system does not work.
Negative testing is related to the testers’ attitude rather than a spe-
cific test approach or test design technique, e.g. testing with invalid
input values or exceptions. (Beizer 1990)
Non-functional testing Testing the attributes of a component or system that do not relate to
functionality, e.g. reliability, efficiency, usability, maintainability
and portability. (ISTQB 2006)
Off-the-shelf Software A software product that is developed for the general market, i.e. for
a large number of customers, and that is delivered to many custom-
ers in identical format. (ISTQB 2006)
Output A variable (whether stored within a component or outside) that is
written by a component. (ISTQB 2006)
Output Value An instance of an output. (ISTQB 2006) See also output.
Pass A test is deemed to pass if its actual result matches its expected re-
sult. (ISTQB 2006)
Postcondition Environmental and state conditions that must be fulfilled after the
execution of a test or test procedure. (ISTQB 2006)
Precondition Environmental and state conditions that must be fulfilled before the
component or system can be executed with a particular test or test
procedure. (ISTQB 2006)
Problem See defect.
Quality The degree to which a component, system or process meets speci-
fied requirements and/or user/customer needs and expectations.
(IEEE Std 610.12-1990)
Quality Assurance Part of quality management focused on providing confidence that
quality requirements will be fulfilled. (ISO Std 9000-2005)
Regression Testing Testing of a previously tested program following modification to
ensure that defects have not been introduced or uncovered in un-
changed areas of the software, as a result of the changes made. It is
performed when the software or its environment is changed.
(ISTQB 2006)
Requirement A condition or capability needed by a user to solve a problem or
achieve an objective that must be met or possessed by a system or
system component to satisfy a contract, standard, specification, or
other formally imposed document. (IEEE Std 610.12-1990)
Result The consequence/outcome of the execution of a test. It includes
outputs to screens, changes to data, reports, and communication
messages sent out. See also actual result, expected result. (ISTQB
2006)

x
Running Tested Features Running Tested Features is a metric to measure the progress of an
(RTF) agile team. (Jeffries 2004)
Sentence Format Keyword Term defined in this thesis for keywords which name is a sentence
and it does not take any arguments. See also base keyword and user
keyword.
Software Computer programs, procedures, and possibly associated documen-
tation and data pertaining to the operation of a computer system.
(IEEE Std 610.12-1990)
Software Quality The totality of functionality and features of a software product that
bear on its ability to satisfy stated or implied needs. (ISO/IEC Std
9126-1:2001)
Static Code Analysis Analysis of source code carried out without execution of that soft-
ware. (ISTQB 2006)
Static Testing Testing of a component or system at specification or implementation
level without execution of that software, e.g. reviews or static code
analysis. (ISTQB 2006) See also dynamic testing.
System A collection of components organized to accomplish a specific
function or set of functions. (IEEE Std 610.12-1990)
System Testing The process of testing an integrated system to verify that it meets
specified requirements. (Burnstein 2003) See also component test-
ing, integration testing and acceptance testing.
System Under Test (SUT) The entire system or product to be tested. (Craig and Jaskiel 2002)
Test A set of one or more test cases. (IEEE Std 829-1983)
Test Automation The use of software to perform or support test activities, e.g. test
management, test design, test execution and results checking.
(ISTQB 2006)
Test Automation Frame- A framework used for test automation. Provides some core func-
work tionality (e.g. logging and reporting) and allows its testing capabili-
ties to be extended by adding new test libraries. (Laukkanen 2006)
Test Case A set of input values, execution preconditions, expected results and
execution postconditions, developed for a particular objective or
test condition, such as to exercise a particular program path or to
verify compliance with a specific requirement. (IEEE Std 610.12-
1990)
Test Data Data that exists (for example, in a database) before a test is exe-
cuted, and that affects or is affected by the component or system
under test. (ISTQB 2006)
Test-Driven development A way of developing software where the test cases are developed,
(TDD) and often automated, before the software is developed to run those
test cases. (ISTQB 2006)
Test Execution The process of running a test on the component or system under
test, producing actual result(s). (ISTQB 2006)

xi
Test Execution Automation The use of software, e.g. capture/playback tools, to control the exe-
cution of tests, the comparison of actual results to expected results,
the setting up of test preconditions, and other test control and re-
porting functions. (ISTQB 2006)
Test Engineer See tester.
Test Input The data received from an external source by the test object during
test execution. The external source can be hardware, software or
human. (ISTQB 2006)
Test Level A group of test activities that are organized and managed together.
A test level is linked to the responsibilities in a project. Examples of
test levels are component test, integration test, system test and ac-
ceptance test. (Pol 2002)
Test Log A chronological record of relevant details about the execution of
tests. (IEEE Std 829-1983)
Test Logging The process of recording information about tests executed into a
test log. (ISTQB 2006)
Test Report A document summarizing testing activities and results. (IEEE Std
829-1983)
Test Run Execution of a test on a specific version of the test object. (ISTQB
2006)
Test Runner A generic driver script capable to execute different kinds of test
cases and not only variations with slightly different test data.
(Laukkanen 2006)
Test Result See result.
Test Script Commonly used to refer to a test procedure specification, especially
an automated one. (ISTQB 2006)
Test Set See test suite.
Test Suite A set of several test cases for a component or system under test,
where the postcondition of one test is often used as the precondition
for the next one. (ISTQB 2006)
Testability The capability of the software product to enable modified software
to be tested. (ISO/IEC Std 9126-1:2001)
Tester A skilled professional who is involved in the testing of a component
or system. (ISTQB 2006)
Testing The process consisting of all life cycle activities, both static and
dynamic, concerned with planning, preparation and evaluation of
software products and related work products to determine that they
satisfy specified requirements, to demonstrate that they are fit for
purpose and to detect defects. (ISTQB 2006)
User Keyword Keyword constructed from base keywords and other user keywords
in a test design system. User keywords can be created easily even
without programming skills. (Laukkanen 2006) See also base key-
word and sentence format keyword.

xii
Unit Testing See component testing.
Variable An element of storage in a computer that is accessible by a software
program by referring to it by a name. (ISTQB 2006)
White-Box Testing Testing based on an analysis of the internal structure of the compo-
nent or system. (ISTQB 2006) See also black-box testing.

xiii
1 INTRODUCTION
1.1 Motivation
Quality is one of the most important aspects of software products. If software does not work, it is not
worth a lot. The drawbacks caused by faulty software can be much higher than the advantages gained
from using it. Malfunctioning or difficult to use software can complicate daily life. In life critical sys-
tems faults may even cause loss of human lives. In highly competing markets quality may determine
which software product is going to be a success and which ones are going to fail. Low quality software
products have a negative impact on firms’ reputation and unquestionably also on the sales. Unhappy
customers are also more willing to change to other software suppliers. For these reasons organizations
have to invest on the quality of software products.

Even high quality software can fail at the markets if it does not meet the customers’ needs. At the be-
ginning of a software project it is common that customers’ exact needs are unknown. This may lead to
guessing the wanted features and development of useless features and in the worst case useless soft-
ware. This should obviously be avoided.

New feature ideas usually arise when the customer understands the problem domain more thoroughly.
This might be quite problematic if strict contractual agreements on the developed features exist. Even
when it is contractually possible to add new features to the software, there might be a lot of rework
before the features are ready for use.

Iterative and especially agile software processes are introduced as a solution for changing require-
ments. The basic idea in the iterative processes is to create the software in small steps. When software
is developed in this way, the customers can try out the developed software and based on the customer’s
feedback, the development team can create features that are valuable for the customer. The most valu-
able features are developed first allowing the customer to start using the software earlier than the soft-
ware developed with a non-iterative development process.

Iterative software development adds new challenges for software testing. In traditional software pro-
jects main part of the testing is conducted in the end of the development project. With the iterative and
agile processes the software should, however, be tested in every iteration. If the customer uses the re-
sult of the iteration, at least all the major problems should be solved before the product can be deliv-
ered. In an ideal situation each iteration outcome would be high quality software.

1
In the agile methods the need for testing is understood and there are development practices that are
used to assure the quality of the software. Many of these practices are targeted for developers and used
to test that the code works as the developers have thought it should. To also test that the features fulfill
the customer’s requirements there is need for a higher level testing. This higher level testing is often
called as acceptance testing or customer testing. Customer input is needed to define these higher level
test cases to make sure that her requirements are met.

Because the software is developed in the iterative manner and there is a continuous change, it would be
beneficial to test all the features at least once during the iteration. Testing over again is needed, be-
cause the changes may have caused defects. Testing manually all functionalities after every change is
not possible. It may be possible at the beginning, but when the count of features rises, manual regres-
sion testing becomes harder and eventually impossible. This leads to a situation in which the changes
done late in the iteration may have caused faults that cannot be noticed in testing. And even if the
faults could be noticed, developers may not be able to fix them during the iteration.

Test automation can be used for helping the testing effort. Test automation means testing software with
other software. When software and computers are used for testing, the test execution can be conducted
much faster than manually. If the automated tests can be executed daily or even more often, the status
of the developed software is continuously known. Therefore the problems can be found faster and the
changes causing the problems can be pinpointed. That is why the test automation is an integral part of
agile software development.

By automating the customer defined acceptance tests, the test cases defining how the system should
work from the customer point of view can be executed often. This makes it possible to know the status
of the software in any point of the development. In acceptance test-driven development this approach
is taken even further and the acceptance tests are not only used for verifying that the system works but
also driving the system development. The customer defined test cases are created before the implemen-
tation starts. The goal of the implementation is then to develop software that passes all the acceptance
test cases.

2
1.2 Aim of the Thesis
The aim of this thesis is to investigate whether the acceptance test-driven development can be used
with in-house built keyword-driven test automation framework. The research is conducted in a real-life
agile software development project and the suitability is evaluated in this case project. Also the pros
and cons of this approach are evaluated. More detailed research question will follow in Chapter 8 after
the acceptance test-driven development and keyword-driven test automation concepts are clarified.
One purpose is to present the framework usage in level that can help others to try the approach with
similar kind of tools.

1.3 Structure of the Thesis


Structure of this thesis is the following; in Chapter 2 the traditional software testing is described to in-
troduce the basic concepts needed in the following chapters. In Chapter 3 the basis of the agile and it-
erative software development is described. The testing in the agile software development is introduced
in Chapter 4. Chapter 4 contains also acceptance test-driven development which is the main topic in
this thesis. Chapter 5 includes the test automation approaches in general and the keyword-driven test
automation approach in particular. After the keyword-driven approach is introduced, the keyword-
driven test automation framework used in this thesis is explained in Chapter 6 in level that is needed to
understand the coming Chapters. Chapter 7 contains simple and fictitious example of the usage of the
presented keyword-driven test automation framework with acceptance test-driven development.

The research questions are defined in Chapter 8. The case project and product developed in the case
project are described in Chapter 9. The research method used to conduct this research is also explained
in Chapter 9. Chapter 10 contains all the results from the project. First the development model used in
the case project is described. Then the use of acceptance test-driven development with the keyword-
driven test automation framework is represented. Chapter 10 also contains results from the interviews
which were conducted at the end of the research. In Chapter 11 the observations gained from the case
project are analyzed. Chapter 12 contains the conclusions and the discussion about the results and the
meaning of the analysis in a wider perspective. Further research areas are presented at the end of Chap-
ter 12.

3
2 TRADITIONAL TESTING
In this chapter the traditional testing terminology and divisions of different testing aspects are de-
scribed. The purpose is to give an overall view of the testing field and make it possible in the following
chapters to compare agile testing to traditional testing and specify the research area in a wider context.

2.1 Purpose of Testing


Testing is an integral part of the software development. The goal of software testing is to find faults
from developed software and to make sure they get fixed (Kaner et al. 1999, Patton 2000). It is impor-
tant to find the faults as early as possible because fixing them is more expensive in the later phases of
the development (Kaner et al. 1999, Patton 2000). The purpose of testing is also to provide information
about the current state of the developed software from the quality perspective (Burnstein 2003). One
might argue that software testing should make sure that software works correctly. This is however im-
possible because even a simple piece of software has millions of paths that should all be tested to make
sure that it works correctly (Kaner et al. 1999).

2.2 Dynamic and Static Testing


On a high level, software testing can be divided into dynamic and static testing. The division to these
two categories can be done based on whether the software is executed or not. Static testing means test-
ing without executing the code. This can be done with different kinds of reviews. Reviewed items can
be documents or code. Other static testing methods are static code analysis methods for example syn-
tax correctness and code complexity analysis. With static testing faults can be found in an early phase
of software development because the testing can be started before any code is written. (IEEE Std
610.12-1990; Burnstein 2003)

Dynamic testing is the opposite of static testing. The system under test is tested by executing it or parts
of it. Dynamic testing can be divided to functional testing and non-functional testing which are pre-
sented below. (Burnstein 2003)

2.3 Functional and Non-Functional Testing


The purpose of functional testing is to verify that software corresponds to the requirements defined for
the system. The focus on functional testing is to enter inputs to the system under test and verify the
proper output and state. The concept of functional testing is quite similar to all systems, even though
the inputs and outputs differ from system to system.

4
The non-functional testing means testing quality aspects of software. Examples of non-functional test-
ing are performance, security, usability, portability, reliability, and memory management testing. Each
non-functional testing needs different approaches and different kind of know-how and resources. The
needed non-functional testing is always decided based on the quality attributes of the system and there-
fore selected by case basis. (Burnstein 2003)

2.4 White-Box and Black-Box Testing


There are two basic testing strategies, white-box testing and black-box testing. When the white-box
strategy is used, the internal structure of the system under test is known. The purpose is to verify the
correct behavior of internal structural elements. This can be done for example by exercising all the
statements or all conditional branches. Because the white-box testing is quite time consuming, it is
usually done for small parts of the system at a time. White-box testing methods are useful in finding
design, code-based control, logic and sequence defects, initialization defects, and data flow defects.
(Burnstein 2003)

In black-box testing the system under test is seen as an opaque box. There is no knowledge of the inner
structure of the software. The only knowledge is how the software works. The intention of the black-
box testing is to provide inputs to the system under test and verify that the system works as defined in
the specifications. Because black box approach considers only behavior and functionality of the system
under test, it is also called functional testing. With black box strategy requirement and specification
defects are revealed. Black-box testing strategy can be used at all test levels defined in the following
chapter. (Burnstein 2003)

2.5 Test Levels


Testing can be performed in multiple levels. Usually software testing is divided into unit testing, inte-
gration testing, system testing, and acceptance testing (Dustin et al. 1999; Craig & Jaskiel 2002; Burn-
stein 2003). The purpose of these different test levels is to investigate and test the software from differ-
ent perspectives and find different type of defects (Burnstein 2003). If the division of levels is done
from test automation perspective, the levels can be unit testing, component testing and system testing
(Meszaros 2003; Laukkanen, 2006). In this thesis, whenever traditional test levels are used, the divi-
sion into unit, integration, system, and acceptance testing is meant. Figure 1 shows these test levels and
their relative order.

5
Figure 1: Test levels (Burnstein 2003)

UNIT TESTING

The smallest part of software is a unit. A unit is traditionally viewed as a function or a procedure in a
(imperative) programming language. In object-oriented systems methods and classes/objects can be
seen as units. Unit can also be a small-sized component or a programming library. The principal goal
of unit testing is to detect functional and structural defects in the unit. Sometimes the name component
is used instead of a unit. In that case the name of this phase is component testing. (Burnstein 2003)

There are different opinions about who should create unit tests. Unit testing is in most cases best han-
dled by developers who know the code under test and techniques needed (Dustin et al. 1999; Craig &
Jaskiel 2002; Mosley & Posey 2002). On the other hand, Burnstein (2003) thinks that an independent
tester should plan and execute the unit tests. The latter is the more traditional point of view, pointing
that nobody should evaluate their own job.

Unit testing can be started in an early phase of the software development after the unit is created. The
failures revealed by the unit tests are usually easy to locate and repair since only one unit is under con-
sideration (Burnstein 2003). For these reasons, finding and fixing the defects is cheapest on the unit
test level.

6
INTEGRATION TESTING

When units are combined the resulting group of units is called a subsystem or some times in object-
oriented software system a cluster. The goal of integration testing is to verify that the component/class
interfaces are working correctly and the control and data flows are working correctly between the
components. (Burnstein 2003)

SYSTEM TESTING

When ready and tested subsystems are combined to the final system, system test execution can be
started. System tests evaluate both the functional behavior and non-functional qualities of the system.
The goal is to ensure that the system performs according to its requirements when tested as a whole
system. After system testing and corrections based on the found faults are done, the system is ready for
the customer’s acceptance testing, alpha testing or beta testing (see next paragraph). If the customer
has defined the acceptance tests, those can be used in the system testing phase to assure the quality of
the system from the customer’s point of view. (Burnstein 2003)

ACCEPTANCE TESTING

When a software product is custom-made, the customer wants to verify that the developed software
meets her requirements. This verification is done in the acceptance testing phase. The acceptance tests
are developed in co-operation between the customer and test planners and executed after the system
testing phase. The purpose is to evaluate the software in terms of customer’s expectations and goals.
When the acceptance testing phase is passed, the product is ready for production. If the product is tar-
geted for mass market, it is often not possible to arrange customer-specific acceptance testing. In these
cases the acceptance testing is conducted in two phases called alpha and beta testing. In alpha testing
the possible customers and members from the development organization test the product in the devel-
opment organization premises. After defects found in alpha testing are fixed, beta testing can be
started. The product is send to a cross-section of users who use it in the real-world environment and
report the found defects. (Burnstein 2003)

7
REGRESSION TESTING

The purpose of regression testing is to ensure that old characteristics are working after changes made
to the software and verify that the changes have not introduced new defects. Regression testing is not a
test level as such and it can be performed in all test levels. The importance of the regression testing
increases when the system is released multiple times. The functionality provided in the previous ver-
sion should still work with all the new functionality and verifying this is very time consuming. There-
fore it is recommended to use automated testing tools to support this task (Burnstein 2003). Also Kaner
et al. (1999) have noticed that it is a common way to automate acceptance and regression tests to
quickly verify the status of the latest build.

8
3 AGILE AND ITERATIVE SOFTWARE DEVELOPMENT
The purpose of this chapter is to explain iterative development model and agile methods in general, and
illustrate development models Scrum and Extreme Programming (XP) on a more detailed level be-
cause their relevance to this thesis.

3.1 Iterative Development Model


In iterative development model software is built using multiple sequential iterations during the whole
lifecycle of the software. An iteration can be seen as a mini-project containing requirement analysis,
design, development, and testing. The goal of the iteration is to build an iteration release. An iteration
release is a partially completed system which is stable, integrated, and tested. Usually most of the itera-
tion releases are internal and not released for external customers. The final iteration release is the com-
plete product, and it is released to customer or to markets. (Larman 2004)

Usually a partial system grows incrementally with new features, iteration by iteration. This is called
incremental development. The concept of a growing system via iterations has been called iterative and
incremental development, although iterative development is the common term. The features to be im-
plemented in iteration are decided at the beginning of the iteration. The customer selects the most valu-
able features at that time, so there is no a strict predefined plan. This is called adaptive planning. (Lar-
man 2004)

In modern iterative methods, the recommended length of iteration is between one and six weeks. In
most of the iterative and incremental development methods the length of the iteration is timeboxed.
Timeboxing is a practice which sets a fixed end date for the iteration. Fixed end date means that if the
iteration scope can not be met, the features with lowest priority are reduced from the scope of the itera-
tion. This way the growing software is always in a stable and tested state at the end of the iteration.
(Larman 2004)

Evolutionary iterative development implies that requirements, plans, and solutions evolve and they are
being refined during iterations, instead of using predefined specifications. There is also the term adap-
tive development. Difference between these two terms is that the adaptive development implies that the
received feedback is guiding the development. (Larman 2004)

9
Iterative and incremental development makes it possible that an enhanced product is repeatedly deliv-
ered to the markets. This is also called incremental delivery. Usually the incremental deliveries are
done between three and twelve months. Evolutionary delivery is a refinement of incremental delivery.
In evolutionary delivery the goal is to collect feedback and based on that plan content of the next de-
livery. In incremental delivery the feedback is not running the delivery plan. However, there is always
some predefined and feedback based planning and therefore these two terms are used interchangeably.
(Larman 2004)

3.2 Agile Development


Iterative and incremental development is the core of all agile methods, including Scrum and XP. Agile
methods cannot be defined with a single definition, but all of them apply timeboxed iterative and evo-
lutionary delivery as well as adaptive planning. There are also values and practices in agile methods
that support agility, meaning rapid and flexible response to change. Agile methods also promote prac-
tices and principles like simplicity, lightness, communication, self-directed teams, and programming
over documentation. The values and principles that guide the agile methods were written down by a
group interested in iterative and agile methods in 2001. (Larman 2004) Those values are stated in the
Agile Manifesto (Figure 2). Agile software development principles are listed in Appendix A.

Figure 2: Agile Manifesto (Beck et al. 2001a)

10
3.3 Scrum
Scrum is an agile, lightweight process that can be used to manage and control software and product
development, and it uses iterative and incremental development methods. Scrum emphasizes empirical
process rather than defined process. Scrum consists of four phases: planning, staging, development,
and release. In the planning phase items like vision, funding and initial requirements are created. In the
staging phase requirements are defined and prioritized in a way that there is enough content for the
first iteration. In the development phase the development is done in iterations. The release phase con-
tains product tasks like documentation, training, and deployment. (Schwaber & Beedle 2002; Larman
2004; Schwaber 2004)

When using Scrum people involved in software development are divided into three different roles:
product owner, scrum master, and the team. The product owner’s task is to get the funding, collect the
project’s initial requirements and manage the requirements (see product backlog at next page). The
team is responsible for developing the functionality. The teams are self-managing, self-organizing, and
cross-functional, and their task is to figure out how to convert items in the product backlog to func-
tionality in iterations. Team members are collectively responsible for success of iterations and of the
project as a whole, and this is one of the core principles of the Scrum. The maximum size of the team
is seven members. The scrum master is responsible for the Scrum process and teaching Scrum to eve-
ryone in the project. The scrum master also makes sure that everyone follows the rules and practices of
Scrum. (Schwaber 2004)

Scrum consists of several practices, which are Product Backlog, Daily Scrum Meetings, Sprint, Sprint
Planning, Sprint Backlog, Sprint Review, and the Sprint Retrospective. Figure 3 shows the overview of
Scrum.

11
Figure 3: Overview of Scrum (Control Chaos 2006a)

PRODUCT BACKLOG

Product Backlog is a list of all features, functions, technologies, enhancements, and bug-fixes that con-
stitute the changes to be made to the product for future releases. The items in the product backlog are
in a prioritized list which is evolving all the time. The idea is to add new items to it whenever there are
new features or improvement ideas. (Schwaber & Beedle 2002)

SPRINT

Sprint is the name of the timeboxed iteration in the Scrum. The length of sprint is usually 30 calendar-
days. Sprint planning takes place at the beginning of the sprint. There are two meetings in the begin-
ning of the sprint. In the first meeting the product owner and the team select the content for the follow-
ing sprint from the product backlog. Usually the items with the highest priority and risks are selected.
In the second meeting, the team and the product owner meet to consider how to develop the selected
features and create sprint backlog which contains all the tasks that are needed to meet the goals of the
sprint. The duration of the tasks are estimated in the meeting and updated during the sprint. (Schwaber
& Beedle 2002; Larman 2004; Schwaber 2004)

12
DAILY SCRUM

The development progress is monitored with daily scrum meetings. Daily scrum of the specified form
is kept every work day at the same time and place. Meeting should not last more than 15 minutes. The
team is standing in a circle, and the scrum master asks the following questions from all the team mem-
bers:

1. What have you done since the last daily scrum?

2. What are you going to do between now and the next daily scrum?

3. What is preventing you from doing your work?

If any problems are raised during the daily scrum meeting, it is the responsibility of the team to solve
the problems. If the team cannot deal with the problems, it becomes a responsibility of the scrum mas-
ter. If there is a need for a decision, the scrum master has to decide the matter within an hour. If there
are some other problems, the scrum master should solve them within one day before the next daily
scrum. (Schwaber & Beedle 2002; Schwaber 2004)

SPRINT REVIEW

At the end of the sprint results are shown in the sprint review hosted by the scrum master. The purpose
of the sprint review is to demonstrate the done functionality to the product owner and the stakeholders.
After every presentation, all the participants are allowed to voice any comments, observations, im-
provement ideas, changes, or missing features regarding the presented functionality. All these found
items are noted down. At the end of the meeting all the items are checked and placed to the product
backlog for prioritization. (Schwaber & Beedle 2002; Schwaber 2004)

DEFINITION OF DONE

Because only the done functionality can be shown in sprint review, there is a need to define what that
means. Otherwise one might think that functionality is done when a feature is implemented and an-
other thinks that it is done when it is properly tested, documented and ready to be deployed to produc-
tion. Schwaber (2004) recommends having a definition of done that is written down and agreed by all
members of the team. This way all stakeholders know the condition of the demonstrated functional-
ities.

13
SPRINT RETROSPECTIVE

The sprint retrospective meeting is used to improve the performance of the scrum team. The sprint ret-
rospective takes place at the end of the sprint and the participants are the scrum master and the team.
Two questions “What went well during the last sprint?” and “What could be improved in the next
sprint?” are asked from all of the team members. Improvement ideas are prioritized, and the ideas that
should be taken into the next sprint are added as high priority nonfunctional items to the product back-
log. (Schwaber 2004)

RULES IN SCRUM

In addition to earlier mentioned aspects there are a few more rules in Scrum. It is forbidden to add any
new tasks to the sprint backlog during the sprint, and the scrum masters must ensure this. If the pro-
posed new tasks are however more important than the ones in the sprint backlog, the sprint can be ab-
normally terminated by the scrum master. After the termination, a new sprint can be started with the
sprint backlog containing the new tasks. (Schwaber & Beedle 2002; Schwaber 2004)

DAILY BUILD

As mentioned earlier, Scrum is used to manage and control product development, and therefore there
are no strict rules for development practices that should be used. However, there is a need to know the
status of the project on a daily basis, and therefore a daily build practice is needed. The daily build
practice means that every day the developed source code is checked into the version control system,
built and tested. This means that integration problems can be noticed on a daily basis rather than at the
end of the sprint. The daily build practice can be implemented by continuous integration. Because the
daily build is the only development practice that has to be used in Scrum, the team is responsible for
selecting other development practices to be used. This means that many practices from other agile
methods can be used by the team. (Schwaber & Beedle 2002)

14
SCALING SCRUM

It was mentioned that the size of scrum team is seven people. When Scrum is used in a larger project,
the project members can be divided into multiple teams (Schwaber 2004; Larman 2006). When multi-
ple teams are used, the cooperation with the team can be handled with the scrum of scrums. The scrum
of scrums is a daily scrum where at least one member from every scrum team is participating. This
mechanism is used to remove obstacles that concern more than one team (Schwaber 2004). In a larger
project it is also possible to divide the product owner’s responsibilities. Cohn (2007) suggests using
group of product owners with one chief product owner. The product owners work in the teams while
the chief product owner manages the wholeness. Larman (2006) calls product owners working with
scrum teams as feature champions.

3.4 Extreme Programming


Extreme Programming (XP) is a disciplined and still very agile software development method for
small teams from two to twelve members. The purpose of XP is to minimize the risk and the cost of
change in the software development. XP is based on the gained experiences, and successfully used
practices of the father of the method, Kent Beck. Communication, simplicity, feedback, and courage
are the values that XP is based on. Simplicity means as simple code as possible. No extra functionality
is done beforehand even there might be need for a more complex solution in the future. Communica-
tion means continuous communication between the customer and the developers and also between the
developers. Some of the XP practices also force communication. This enhances the spread of important
information inside the project. Continuous testing and communication provide feedback from the state
of the system and the development velocity. Courage is needed to make hard decisions like changing
the system heavily when seeking simplicity and better design. Another form of courage is deleting
code when it is not working at the end of day. To concretize these values there are twelve development
practices which XP heavily counts on. The practices are listed below:

• The Planning Game: Quickly determine the scope of the next release by combining busi-
ness priorities and technical estimates. As reality takes over the plan, update the plan.

• Small Releases: Put a simple system into production quickly, and then release new versions
on a very short cycle.

• Metaphor: Guide all development with simple shared story of how the whole system
works.

15
• Simple Design: The system should be designed as simply as possible at any given moment.
Extra complexity is removed as soon as it is discovered.

• Testing: Programmers continually write unit tests, which must be run flawlessly for devel-
opment to continue. Customers write tests demonstrating that features are finished.

• Refactoring: Programmers restructure the system without changing its behavior to remove
duplication, improve communication, simplify, or add flexibility.

• Pair Programming: All production code is written with two programmers at one machine.

• Collective ownership: Anyone can change any code anywhere in the system at any time.

• Continuous integration: Integrate and build the system many times a day, every time a task
is completed.

• 40-hour week: Work no more than 40 hours a week as a rule. Never work overtime second
week in a row.

• On-site customer: Include a real, live user on the team, available full-time to answer ques-
tions.

• Coding standards: Programmers write all code in accordance with rules the emphasizing
communication through the code.

None of the practices are unique or original. However the idea in the XP is to use all the practices to-
gether. When the practices are used together they complement each other (Figure 4). (Beck 2000)

Figure 4: The practices support each other (Beck 2000)

16
3.5 Scrum and Extreme Programming Together
It is possible to combine agile management mechanism from Scrum and engineering practices from XP
(Control Chaos 2006b). Figure 5 illustrates this approach. Mar and Schwaber (2002) have experience
that these two practices are complementary; when used together, they can have a significant impact on
both the productivity of a team and the quality of its outputs.

Figure 5: XP@Scrum (Control Chaos2006b)

3.6 Measuring Progress in Agile Projects


Ron Jeffries (2004) recommends using the Running Tested Features metric (RTF) for measuring the
team’s agility and productivity. He defines the RTF in the following way:

1. The desired software is broken down into named features (requirements, stories) which are
part of the system to be delivered.

2. For each named feature, there are one or more automated acceptance tests which, when they
work, will show that the feature in question is implemented.

3. The RTF metric shows, at every moment in the project, how many features are passing all
their acceptance tests.

17
The RTF is a simple metric and it measures well the most important aspect of software, which is the
amount of working features. The amount of RTF should start to increase in the beginning of the project
and keep increasing until the end of the project. If the curve is not rising, there must be some problems
in the project. Figure 6 shows how the RTF curve could look like if the project is doing well. (Jeffries
2004)

Figure 6: RTF curve for an agile project (Jeffries 2004)

18
4 TESTING IN AGILE SOFTWARE DEVELOPMENT
Agile testing is guided by the Agile Manifesto presented in Figure 2. Marick (2001) sees the working
code and conversing people as the most important guides for agile testing. Communication between
the project and test engineers should not be based on communicating with the written requirements and
the design specifications handed over the wall to the testing department and then communicating back
with the specifications and defect reports. Instead Marick (2001) emphasizes face-to-face conversa-
tions and informal discussions as the main channel for getting testing ideas and creating the test plan.
Test engineers should work with developers and help testing even unfinished features. Marick is one of
the people agreeing with the principles of the context-driven testing school (Kaner et al. 2001a), and
therefore the principles in agile testing and context-driven testing overlap.

4.1 Purpose of Testing


The purpose of agile testing is to build confidence in the developed software. In extreme programming
the confidence is built on two test levels. The unit tests created with test-driven development increase
the developers’ confidence, and the customer’s confidence is founded on the acceptance tests (Beck
2000). Unit tests verify that the code works correctly and acceptance tests make sure correct code has
been implemented. In Scrum the integration and acceptance tests are not described (Abrahamsson et al.
2002), and therefore it is up to the team to define the testing related issues. Itkonen et al. (2005) state
that in agile testing the focus is on the constructive quality assurance practices. This is opposite to the
destructive quality assurance practices like negative testing used in the traditional testing. Itkonen et al.
(2005) have doubts about the sufficiency of the constructive quality assurance practices, but admit that
more research in that area is needed.

4.2 Test Levels


In agile development the different testing activities overlap. This is mainly because the purpose is to
deliver working software repeatedly. The levels of agile testing cannot be similarly distinguished from
development phases as the traditional test levels can. The contents of the different levels also differ in
agile and traditional testing. As was mentioned in the previous chapter, in XP the confidence is built
with the unit and acceptance tests. As was earlier mentioned, Scrum does not contain guidelines on
how testing should be conducted. There are also other opinions in the agile community how the testing
could be divided. Therefore there is no coherent definition for the test levels in the agile testing. How-
ever, test levels in XP and some other categorizations are presented below.

19
UNIT TESTING

The unit testing, sometimes called also as developer testing, can be seen very similar to traditional unit
testing. However, unit tests are usually done using test-driven development (TDD). As the name test-
driven indicates, unit tests are written before the code (Beck 2003; Astels 2003). When TDD is used it
is obvious that a developer writes the unit tests. Even though TDD is used to create the unit tests, its
only purpose is not just testing. The TDD is an approach to write and design maintainable code, and as
a nice side effect, a suite of unit tests is produced (Astels 2003).

ACCEPTANCE TESTING IN XP

The acceptance testing in XP has a wider meaning than the traditional acceptance testing. Acceptance
tests can contain functional, system, end-to-end, performance, load, stress, security, and usability test-
ing, among others (Crispin 2005). Acceptance tests are also called customer and functional tests in XP
literature, but in this thesis the term acceptance test is used.

The acceptance tests are written by the customer or by a tester with the customer’s help (Beck 2000).
In some projects defining the acceptance tests have been a joint effort of the team (Crispin et al. 2002).
The aim of acceptance testing is to show that the product is working as the customer wants and in-
crease her confidence (Beck 2000; Jeffries 1999). The acceptance tests should contain only tests for
features that customer wants. Jeffries (1999) advices to invest wisely and pick tests that have a mean-
ing when passing and failing. Crispin et al. (2002) mention also that the purpose of the acceptance tests
is not to go through all the paths in the system because the unit tests take care of that. However,
Crispin (2005) had noticed that teams doing TDD test only the “happy paths”, especially when trying
the TDD for the first time. Misunderstood requirements and hard to find defects may go undetected.
Therefore the acceptance tests keep the teams on track.

The acceptance tests should be always automated, and the automated tests should be simple and cre-
ated incrementally (Jeffries et al. 2001; Crispin & House 2005). However, in practice, automating all
the tests are extremely hard and some trade-offs has to be done (Crispin et al. 2002). Kaner (2003)
thinks that automating all acceptance tests is a serious error and the amount of automated tests should
be decided based on the context. Jeffries (2006) admits that automating all the tests is impossible but
still phrases “if we want to be excellent at automated testing, we should set out to automate all tests”.
When automating the tests, the entire development team should be responsible for the automation tasks
(Crispin et al. 2002). The test first approach can be used also with the acceptance tests. The acceptance
test-driven development concept is introduced in Chapter 4.3.

20
OTHER TESTING PRACTICES IN XP

While the unit and the acceptance testing are the heart of XP, Beck (2000) admits that there are also
other testing practices that make sense from time to time. He lists parallel test, stress test, and monkey
test as examples of these kinds of helpful testing approaches.

OTHER TEST LEVELS IN AGILE TESTING

There are also other test level divisions in the agile testing community addition to the division in XP.
Marick (2004) divides testing into four categories: technology-facing programmer support, business-
facing team support, business-facing product critiques, and technology-facing product critiques. In
Marick’s division, unit testing can be seen as technology-facing programmer support and acceptance
testing as business-facing team support. The business-facing product critiques means testing forgotten,
wrongly defined, or otherwise false requirements. Marick (2004) believes that different kinds of ex-
ploratory testing practices can be used in this phase. Technology-facing product critiques corresponds
to non-functional testing.

Hendrickson (2006) divides the agile testing practices into automated acceptance or story tests, auto-
mated unit tests, and manual exploratory testing (Figure 7). She thinks the exploratory testing provides
additional feedback and covers gaps in automation. She also states that the exploratory testing is neces-
sary to augment the automated tests. This division is quite similar with the Marick’s (2004) division
from the functional testing’s point of view.

21
Figure 7: Agile testing practices (Hendrickson 2006)

4.3 Acceptance Test-Driven Development


The idea of acceptance test-driven development (ATDD) was firstly introduced by Beck (2003) with
the name application test-driven development. However, he had some doubts on how well the accep-
tance tests can be written before the development. Before this, the acceptance test-driven development
had been used, although named as acceptance testing (Miller & Collins 2001). After that, there have
been projects using acceptance test-driven development (Andersson et al. 2003; Reppert 2004; Crispin
2005; Sauvé et al. 2006). The ATDD concept has also been called as story test-driven development
(Mudridge & Cunningham 2005, Reppert 2004) and customer test-driven development (Crispin 2005).

22
PROCESS

On a high level the acceptance test-driven development process contains three steps. The first step is to
define the requirements for the coming iteration. In agile projects the requirements are usually written
in a format of user stories. User stories are short descriptions representing the customer requirements
used for planning and as a reminder (Cohn 2004). When the user stories are defined, the acceptance
tests for those requirements can be done. As the name acceptance test indicates the purpose of these
tests is to define the acceptable functionality of the system. Therefore, the customer has to take part in
defining the acceptance tests. The acceptance tests have to be written in a format the customer under-
stands (Miller & Collins 2001; Mudridge & Cunningham 2005). When the tests have been defined, the
development can be started. As the concept on a high level is quite simple, there are multiple possible
approaches by whom, when and in which extent the acceptance tests are written and automated.

WHO WRITES THE TESTS

As it was mentioned above, the customer or some other person with the proper knowledge of the do-
main is needed when writing the tests (Reppert 2004; Crispin 2005). Usually the customer needs some
help in writing the tests (Crispin 2005). Crispin (2005) describes a process where the test engineer
writes the acceptance tests with customer. On the other hand, it is also possible for the developers and
the customer to define the tests (Andersson et al. 2003). It is also possible that the customer, the devel-
opers and the test engineers write the tests in collaboration (Reppert 2004). As can be seen, there are
several alternative ways of writing the acceptance tests, and it evidently depends on the available peo-
ple and their skills.

WHEN TESTS ARE WRITTEN AND AUTOMATED

Tests are written before the development when ATDD is used. This can mean writing the test cases
before the iteration planning or after it. Mudridge and Cunningham (2005) describe an example on
how to use the acceptance tests to define the user stories on a more detailed level and this way ease the
task estimation in the iteration planning session. Watt and Leigh-Fellows (2004) have also used accep-
tance tests to clarify the user stories before the planning sessions. On the other hand, Crispin (2005)
and Sauvé et al. (2006) describe a process where the acceptance tests are developed after the stories
have been selected for the iteration.

23
While working in one software development project, Crispin (2005) noticed that writing too many de-
tailed test cases at the beginning can make it difficult for the developers to understand the big picture.
Therefore, in that project the high level test cases were written at the beginning of the iteration and the
more detailed low level test cases were developed parallel with the developers writing the code. This
way the risk of having to rework a lot of test cases is lowered. A similar kind of an approach has also
been used by Andersson et al. (2003) and Miller and Collins (2001). However, Crispin (2005) states
that this is not “pure” ATDD because all the tests are not written before the code.

HOW ACCEPTANCE TESTS ARE AUTOMATED

As was mentioned in Chapter 4.2 the goal in agile testing is to automate as many tests as possible. De-
pending on the tool used to automate the test cases, the actual work varies. In general, there are two
tasks. The test cases have to be written in a format that can be processed with the test automation
framework. In addition to these test cases, some code is needed to move the instructions from the test
cases into the system under test. Often this code bypasses the graphical user interface and calls
straightly the business logic (Reppert 2004; Crispin 2005).

There are several open source tools used to automate the test cases. The most known of these tools is
FIT (Framework for Integrated Test) (Sauvé et al. 2006). When FIT is used the test cases consist of
steps which are presented in tabular format. Developers have to implement test code for every different
kind of step. Sauvé et al. (2006) see this as the weakness of FIT. Other tools and approaches used to
automate the acceptance test cases are not presented here.

24
PROMISES AND CHALLENGES

Table 1 and Table 2 show the promises and challenges of acceptance test-driven development collected
from the different references mentioned in the previous chapters.

PROMISES

The risk of building incor- The communication gap is reduced because the tests are an effective
rect software is decreased. medium of communication between the customer and the develop-
ment (Sauvé et al. 2006). When the collaboration takes place just
before the development, there is a clear context for having a conver-
sation and removing misunderstandings (Reppert 2004). Crispin
(2005) even thinks that the most important function of the tests is to
force the customer, the developers and the test engineers to commu-
nicate and create a common understanding before the development.

The development status is When acceptance tests created in collaboration are passing, the fea-
known at any point. ture is done. The readiness of the product can be evaluated based on
the results of the suite of automated tests executed daily (Miller and
Collins 2001). Knowing what features are ready makes also the pro-
ject tracking easier and better (Reppert 2004).

A clear quality agreement is The tests made in collaboration with the customer and the develop-
created. ment team serve as a quality agreement between the customer and
the development (Sauvé et al. 2006).

Requirements can be de- The requirements are described as executable artifacts that can be
fined more cost-effectively. used to automatically test the software. Misunderstandings are less
likely than with requirements defined in textual descriptions or dia-
grams. (Sauvé et al. 2006)

The requirements and tests Requirement changes become test updates, and therefore they are
are in synchronization. always in synchronization (Sauvé et al. 2006).

The quality of tests can be The errors in the tests are corrected and approved by the customer,
improved. and therefore the quality of the tests is improved (Sauvé et al. 2006).

25
Confidence in the developed Without tests the customers cannot have confidence in the software
software is increased. (Miller and Collins 2001). The customers get confidence because
they do not need to just hope that the developers have understood
the requirements (Reppert 2004).

A clear goal for the devel- The developers have a clear goal in making the customer defined
opers. acceptance tests to pass and that can prevent feature creep (Reppert
2004, Sauvé et al. 2006).

The test engineers are not Because the developers and test engineers have the same well de-
seen as “bad guys”. fined goal, the developers do not see the test engineers as “bad
guys” (Reppert 2004).

Problems can be found ear- The customer’s domain knowledge helps to create meaningful tests.
lier. This helps to find problems already in an early phase of project
(Reppert 2004).

Improve the design of the Joshua Kerievsky has been amazed how much simpler the code is
developed system. when ATDD is used (Reppert 2004).

The correctness of refactor- The acceptance tests are not relying in the internal design of soft-
ing can be verified. ware, and therefore they can be used to reliably verify the refactor-
ing has not broken anything (Andersson et al. 2003).

Table 1: Promises of ATDD

26
CHALLENGES

Automating tests Crispin (2005) has noticed that defining and automating tests can be a
huge challenge even with light tools like FIT.

Writing the tests before de- It might be hard to find time for writing the tests in advance (Crispin
velopment. 2005).

The right level of test cases Crispin (2005) has noticed that when many test cases are written be-
forehand, the test cases can cause more confusion than help to under-
stand the requirements. This causes a lot of rework because some of
the test cases have to be refactored. Therefore the team Crispin (2005)
worked with, started with a few high level test cases and added more
test cases during the iteration.

Table 2: Challenges of ATDD

Promises and challenges are revisited in the end of the thesis when the observations are analyzed.

27
5 TEST AUTOMATION APPROACHES
The purpose of this chapter is to describe briefly the field of test automation and the evolution of test
automation frameworks. In addition the keyword-driven testing approach is explained on a more de-
tailed level.

5.1 Test Automation


The term test automation usually means test execution automation. However, test automation is a much
wider term and it can also mean activities like test generation, reporting the test execution results, and
test management (Bach 2003a). All these test automation activities can take place on all the different
test levels described in Chapter 2.5. The extent of test automation can also vary. Small scale test auto-
mation can mean tool aided testing like using a small collection of testing tools to ease different kind
of testing tasks (Bach 2003a). On the other hand, large scale test automation frameworks are used for
setting up the environment, executing test cases, and reporting the results (Zallar 2001).

Automating the testing is not an easy task. There are several issues that have to be taken into account.
Fewster and Graham (1999) have listed the common test automation problems as unrealistic expecta-
tions, poor testing practice, an expectation that automated tests will find a lot of new defects, a false
sense of security, maintenance, technical problems, and organizational issues. As can be noticed, the
list is quite long, and therefore all these issues have to be taken into account when planning the test
automation usage. Laukkanen (2006) also lists some other test automation issues like when to auto-
mate, what to automate, what can be automated, and how much to automate.

28
5.2 Evolution of Test Automation Frameworks
The test automation frameworks have evolved over the time (Laukkanen 2006). Kit (1999) divides the
evolution into three generations. The first generation test automation frameworks are unstructured, test
cases are separate scripts containing also test data and therefore almost non-maintainable. In the sec-
ond generation frameworks the test scripts are well-designed, modular and documented. This makes
the second generation frameworks maintainable. The third generation frameworks are based on the
second generation with the difference that the test data is taken out of the scripts. This makes the test
data variation easy and similar test cases can be created quickly and without coding skills. This con-
cept is called data-driven testing. The limitation of the data-driven testing is that one script is needed
for every logically different test case (Fewster & Graham 1999; Laukkanen 2006). This can easily in-
crease the amount of needed scripts dramatically. The keyword-driven testing is a logical extension of
the data-driven testing (Fewster & Graham 1999), and it is described in the following chapter.

5.3 Keyword-Driven Testing


In the keyword-driven testing also the keywords controlling the test execution are taken out of the
scripts into the test data (Fewster & Graham 1999; Laukkanen 2006). This makes it possible to create
new test cases in the test data without creating a script for every different test case allowing also the
test engineers without coding skills to add new test cases (Fewster & Graham 1999; Kaner et al.
2001b). This removes the biggest limitation of the data-driven testing approach. Figure 8 is an example
of keyword-driven test data containing two simple test cases for testing a calculator application. The
test cases consist of keywords Input, Push and Check, and the arguments which are inputs and ex-
pected outputs for the test cases. As it can be seen, it is easy to add logically different test cases with-
out implementing new keywords.

29
Figure 8: Keyword-driven test data file (Laukkanen 2006)

To be able to execute the tabular format test cases shown in Figure 8, there have to be mapping from
the keywords to the code interacting with system under test (SUT). The scripts or code implementing
the keywords are called handlers by Laukkanen (2006). In Figure 9 can be seen the handlers for the
keywords used in test data (Figure 8). In addition to the handlers, test execution needs a driver script
which parses the test data and calls the keyword handlers according to the parsed data.

Figure 9: Handlers for keywords in Figure 8 (Laukkanen 2006)

30
If there is a need for creating high level and low level test cases, different level keywords are needed.
Simple keywords like Input are not enough for high level test cases. There are simple and more flexi-
ble solutions according to Laukkanen (2006). Higher level keywords can be created inside the frame-
work by combining the lower level keywords. The limitation of this approach is the need for coding
skills whenever there is a need for new higher level keywords. A more flexible solution proposed by
Buwalda et al. (2002), Laukkanen (2006) and Nagle (2007) is to include a possibility to combine exist-
ing keywords in the keyword-driven test automation framework. This makes it possible to create
higher level keywords by combining existing keywords inside the test data. Laukkanen (2006) calls
these combined keywords as user keywords and this term will be used also in this thesis.

31
6 KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK
The keyword-driven test automation framework used in this research was developed inside the com-
pany the study took place and was called as Robot. The ideas and the basic concept of Robot were
based on the master’s thesis of Laukkanen (2006). In the following chapters some functionalities of
Robot that are interesting from this thesis’s point of view are briefly explained.

6.1 Keyword-Driven Test Automation Framework


In the keyword-driven test automation framework there are three logical parts; the test data, the test
automation framework and the test libraries. The test data contains directives telling what to do with
associated inputs and expected outputs. The test automation framework contains the functionality to
read the test data, run the handlers in the libraries based on the directives in the test data, and handle
errors during the test execution. The test automation framework contains also test logging and test re-
porting functionality. The test libraries are the interface between the framework and system under test.
The libraries can use existing test tools to access the interfaces of the system under test or connect di-
rectly to the interfaces. In Figure 10 the logical structure of Robot is presented.

Figure 10: Logical structure of Robot

32
6.2 Test Data
In Robot, the test data is in tabular format and it can be stored to html or tsv-files. The test data is di-
vided to four different categories; test cases, keywords, variables and settings. All these different test
data types are defined in their own table in the test data file. Robot recognizes the different tables
through the name of the data type in the table’s first header cell.

KEYWORDS AND TEST CASES

In Robot, keywords can be divided into base and user keywords. Base keywords are keywords imple-
mented in the libraries. User keywords are keywords that are defined in the test data by combining
base keywords or other user keywords. The ability to create new user keywords in the test data de-
creases the amount of needed base keywords and therefore amount of programming. User keywords
make it possible to increase the abstraction of test cases. In following Figure 11 the test cases shown in
Figure 8 are modified to use user keywords Add, Equals and Multiply. The test cases are composed of
keywords defined in the second column of test case table and arguments defined in the following col-
umns. User keywords are defined in a similar way. In test case and keyword tables the second column
is named as action. This column name can be defined by a user as it is not used by Robot. Same ap-
plies with the rest of the headers.

Figure 11: Test cases and user keywords (Laukkanen 2006)

33
VARIABLES AND SETTINGS

It is possible to define variables in the Robot framework. Variables increase the maintainability of the
test data because some changes need only updates to the variable values. In some cases variables can
contain test environment specific data, like hostnames or alike. In these cases variables make it easier
to use the same test cases in different environments with minimal extra effort. There are two types of
variables in Robot. Scalar variables contain one value, and it can be anything from a simple string to an
object. A list variable contains multiple items. Figure 12 contains a scalar variable ${GREETING} and
a list variable @{ITEMS}.

Figure 12: Variable table containing scalar and list variables

Settings table is similar to the variable table. Name of the setting is defined in the first column and
value or values in the following columns. Settings are predefined in Robot. Examples of settings are
Library and Resource. Library setting is used to import library which contains the needed base key-
words. The resource setting is used to import resource files. Resource files are used to define the user
keywords and variables in one place.

GROUPING TEST CASES

There are two ways of grouping test cases in Robot. First of all, test cases are grouped hierarchically.
A file containing the test cases (i.e. Figure 11) is called a test case file and it forms a test suite. A direc-
tory containing one or more test case files or directories with test case files also creates a test suite. In
other words, hierarchical grouping is the same as the test data structure in the file system.

The other way to group the test cases is based on project specific agreements. In Robot, there is a pos-
sibility to give words for the test cases that are used for grouping the test cases. These words are called
tags. Tags can be used to define for example part of the system the test case tests, who has created the
test case, does the test case belong to regression tests, and does it take a long time to execute the test
case.

34
6.3 Test Execution
In Robot, the test execution is started from the command line. The scope of the test execution is de-
fined by giving test suite directories or test case files as inputs. Without parameters, all the test cases in
the given test suites are executed. A single test suite or test case can be executed with command line
options. It is also possible to include or exclude test cases from the test run based on the tags (see the
previous chapter). Command line execution makes it possible to start the test execution at some prede-
fined time. It also enables starting test execution from continuous integration systems like Cruise Con-
trol (Cruise Control 2006).

The test execution result can be pass or fail. By default, if even a single test case fails, the test execu-
tion result is a failure. To allow a success in test execution even with failing test cases, Robot contains
a feature called critical tests. The test execution result is failure if any of the critical test cases fails.
This means that test execution is considered successful even though non-critical test cases fail. The
critical test cases are defined when starting the execution from command line. For example regression
can be defined as a critical tag and all the test cases that contain a tag regression are handled as critical
tests. This functionality allows adding test cases to the test execution in case the test case is failing, but
the result is not wanted to be failure. This is needed if the test case or the feature is not ready. These
test cases are not marked as critical.

6.4 Test Reporting


Robot produces a report, a log and an output from the test execution. The report contains statistics and
information based on executed test suites and tags. It can be used as an information radiator since its
background color shows whether the test execution status was pass or fail. The test log contains more
detailed information about the executed keywords and information that can be used to solve problems.
The output contains test execution results presented in an xml-format. The report and the log are gen-
erated from the output.

35
7 EXAMPLE OF ACCEPTANCE TEST-DRIVEN
DEVELOPMENT WITH KEYWORD-DRIVEN TEST
AUTOMATION FRAMEWORK
In this chapter a simple fictitious example of acceptance test-driven development with the Robot
framework is shown. The purpose of this chapter is to help to understand the concept before showing
the concept in practice. This is also a simple theoretical example of how the concept could work. How-
ever, at first the relation between user stories, test cases and keywords are briefly explained.

7.1 Test Data between User Stories and System under Test
As was described in Chapter 4.3, user stories are short descriptions representing the customer require-
ments used for planning. Different levels of test data is needed to map the user stories to the actual
code interacting with the system under test. These levels and their interdependence are shown in Figure
13. First of all the user story is mapped to one or multiple test cases. Every test case contains one or
more sentence format keywords. The sentence format keyword means user keywords which are written
in plain text, possibly containing some input or expected output values but no arguments. When the
test cases contain only the sentence format keywords, those can be understood without technical skills.
Every sentence format keyword consists of one or more base or user keywords. A user keyword in-
cludes one or more base or user keywords. Finally the base keywords contain the code which controls
the system under test. The examples in the following chapters clarify the use of the different type of
keywords presented above.

Figure 13: Mapping from user story to the system under test

36
7.2 User Stories
The customer in this example is a person who handles registrations to different kind of events. People
usually enroll to the events by email or by phone, and therefore the customer needs an application
where to save the registrations. The customer has requested a desktop application that has a graphical
user interface. The customer has defined following user stories:

1. As a registration handler I want to add registrations and see all the registrations so that I can
keep count of the registrations and later contact the registered people.

2. As a registration handler I want to delete one or multiple registrations so that I can remove
the canceled registration(s).

3. As a registration handler I want to have the count of the registrations so that I can notice
when there is no longer room for new registrations.

4. As a registration handler I want to save registrations persistently so that I do not lose the
registrations even if my computer crashes.

7.3 Defining Acceptance Tests


Before the stories can be implemented, there is a need to discuss and clarify a hidden assumption be-
hind the stories. Details arising from the collaboration can be captured as acceptance tests. As was
mentioned in Chapter 4.3, it can vary when and who are participating to this collaboration. Because
those issues are a more matter of the process and the people available than the tool used, those issues
are not taken into account in this example.

The discussion about the user stories between the customer and the development team can lead to ac-
ceptance tests shown in Figure 14. The test cases are in the format that can be used as input for Robot.
Test cases can be written straightly to this format using empty templates. However, it might be easier
to discuss about the user stories and write drafts of the test cases to a flap board during the conversa-
tion. After sketches of the test cases have been made, those can be easily converted to digital format.

37
Figure 14: Some acceptance test cases for the registration application

While discussing the details of the user stories and the test cases, the outline of the user interface can
be drawn. The outline in Figure 15 could be the result of the session where the test cases were created.
It can be used as a starting point for the implementation. In the picture, names for the user interface
elements are also defined. These are implementation details that have to be agreed if different persons
are doing the test cases and the application.

38
Figure 15: Sketch of the registration application

7.4 Implementing Acceptance Tests and Application


After the acceptance tests are defined it should be clear to all the stakeholders what are going to be im-
plemented. If pure acceptance test-driven development is used, the test cases are implemented on a de-
tailed level before the implementation of the application can be started. In this example the implemen-
tation of the test case User Can Add Registrations is described on a detailed level.

CREATING THE TEST CASE ”USER CAN ADD REGISTRATIONS”

User Can Add Registrations test case contains three sentence format keywords as can be seen in Figure
16. The creation of the test case starts with defining those sentence format keywords. To keep the ac-
tual test case file as simple as possible, the sentence format keywords are defined in a separate resource
file. The keywords defined in the resource file have to be taken into use by importing the resource file
in the setting table. Because the test case starts with a sentence format keyword which launches the
application, the application has to be closed at the end of the test case. This can be done in the test case
or with a Test post condition setting. These two settings are shown in Figure 17.

39
Figure 16: Test case “User Can Add Registrations”

Figure 17: Settings for all test cases

Figure 18 shows variables and user keywords defined in the atdd_keyword.html resource file. List
variables @{person1}, @{person2} and @{person3} are described in the variable table. The comments
Name and Email are used to clarify the meaning of the different columns. These variables are used in
the sentence format keywords created in the keyword table. Application is started and there are no
registrations in the database keyword contains two user keywords. The first keyword Clear database
makes sure there are not users in the database when the application is started. The second keyword
User launches registration application launches the registration application. The next two user key-
words User adds three people and all three people should be shown in the application and should exist
in the database repeat the same user keyword with the different person variables described in the vari-
able table. These user keywords are not using base keywords from the libraries, and therefore the test
case is not accessing the system under test at this level. The user keywords used to create the sentence
format keywords can be defined in the same resource file or in other resource files. The missing user
keywords are defined in resource file resource.html.

40
Figure 18: Variables and user keywords for test case “User Can Add Registrations”

In Figure 19 the user keywords used in the atdd_resource.html resource file are described. The base
keywords needed by these user keywords are imported from the SwingLibrary and the OperatingSys-
tem test libraries in the settings table. The SwingLibrary contains base keywords for handling the
graphical user interface of applications made with Java Swing technology. The OperatingSystem li-
brary is a part of Robot, and it contains base keywords for example handling files (like Get file) and
environment variables, and running system commands. If there are no existing libraries for the tech-
nologies the system under test is implemented with or some needed base keywords are missing from
the existing library, the missing keywords must naturally be implemented.

41
Figure 19: User keywords using the base keywords

User launches registration application means the Launch base keyword with two arguments, the main
method of the application, and the title of the application to be opened. Both of these arguments have
been defined in the variables table as scalar variables. User Closes Registration Application uses the
Close base keyword which simply closes the launched application. Clear Database consists of the base
keyword Remove file which removes the database file from the file system. The ${DATABASE} vari-
able contains the path to the database.txt file which is used as a database by the registration applica-
tion. The ${CURDIR} and ${/} variables are Robot’s built-in variables. ${CURDIR} is the directory
where the resource file is and ${/} is a path separator character which is resolved based on the operat-
ing system.

42
User adds registration keyword takes two arguments ${name} and ${email}, and it consists of the
Clear text field, Insert into text field and Push button base keywords. All these keywords take as the
first argument the identifier of the element. These identifiers were agreed in the discussion and can be
seen in Figure 15. The ${name} and ${email} arguments are entered to the corresponding text fields
with the Insert into text field keyword. In the Registration should be shown in the application and
should exist in the database user keyword the List value should exist base keyword is used to check
that the name and email are in the list shown in the application. The Get file base keyword is used to
read the data from the database to the ${data} variable and the Contains base keyword is used to check
that the database contains the name and email pair.

EXECUTING THE TESTS

The team has made an agreement that all test cases that should pass will be tagged with a regression
tag. When the first version of the application is available, the created test cases can be executed. At this
stage none of the test cases are tagged with the regression tag. The result of this first test execution can
be seen in Figure 20. Four of the eleven acceptance test cases passed. Passing test cases can be tagged
now as regression test. In Figure 21 one of the passing test cases tagged with the tag regression is
shown. When the test cases are executed next time, there will be four critical test cases. If any of those
test cases fail, the test execution result will be failure and the report will turn to red.

43
Figure 20: First test execution

Figure 21: Acceptance test case tagged with tag regression

When the application is next time updated, the test cases are executed. Again, all passing test cases can
be tagged with the regression tag. At some point, all the test cases will pass and the features are ready
and the following items can be taken under development. New acceptance test cases are defined, and
the development can start. In case the old functionality is changed, the test cases have to be updated
and the regression tags have to be removed.

44
8 ELABORATED GOALS OF THE THESIS
In this chapter the aim of this thesis is described on a more detailed level. First the scope is defined.
Then the actual research questions are presented.

8.1 Scope
As it was seen in the previous chapters, the field of software testing is very wide. In this thesis the fo-
cus is in the acceptance test-driven development. It is important to distinguish the traditional accep-
tance test level and the agile acceptance test level and in the context of this thesis the term acceptance
testing refers to the latter. Other testing areas are excluded from the scope of this master’s thesis. The
testing areas which are excluded in this thesis are non-functional testing, static testing, unit testing, and
integration testing. Also manual acceptance testing in such is out of the scope, but in some cases it may
be mentioned.

The different aspects and generations of test automation were explained in Chapter 6. This thesis con-
centrates the on large scale keyword-driven test automation framework called Robot. The following
aspects of the test automation are included to scope of this thesis: creating the automated acceptance
test cases, executing the automated acceptance test cases and reporting the test execution results.

8.2 Research Questions


The main aim of this thesis is to study how the keyword-driven test automation technique can be used
in the acceptance test-driven development. The study is done in a real life software development pro-
ject, and therefore another aim is to give an example on how a keyword-driven test automation frame-
work was used in this specific case and also describe all the noticed benefits and drawbacks. The re-
search question can be stated as:

1. Can the keyword-driven test automation framework be used in the acceptance test-
driven development?

2. How is the keyword-driven test automation framework used in the acceptance test-
driven development in the project under study?

3. Does the acceptance test-driven development with the keyword-driven test automa-
tion framework provide any benefits? What are the challenges and drawbacks?

45
The first question can be divided into the following more detailed questions:

1. Is it possible to write the acceptance tests before the implementation with the key-
word-driven test automation framework?

2. Is it possible to write the acceptance tests in a format that can be understood with-
out technical competence with the keyword-driven test automation framework?

The second question can be divided into the following parts:

1. How, when and by whom the acceptance test cases are planned?

2. How, when and by whom the acceptance test cases are implemented?

3. How, when and by whom the acceptance test cases are executed?

4. How and by whom the acceptance test results are reported?

The third research question can be evaluated against the promises and challenges of acceptance test-
driven development shown in Table 1 and Table 2 in Chapter 4.3.

46
9 RESEARCH SUBJECT AND METHOD
The purpose of this chapter is to explain where and how this research was done. At first, the case pro-
ject, and the product developed in the project are described on a level that is needed to understand the
context where the research took place. Then the research method and the used data collection methods
are described.

9.1 Case Project


This research was conducted in a software project at Nokia Siemens Networks referred as the Project
from now on. The Project was located in Espoo. The Project consisted of two scrum teams each con-
sisting of approximately ten persons. In addition to the teams, the Project had a product owner, a pro-
ject manager, a software architect and half a dozen specialists working as feature owners. Feature
owner meant same as feature champion (see Chapter 3.3). There were also several supporting functions
like a test laboratory team. Several nationalities were represented in the Project.

The software product developed in the Project was a network optimization tool referred as the Product
from now on. The Product and its predecessors had been developed almost for 5 years. The Product is
bespoke software aimed for mobile network operators. The Project was started June 2006, and the
planned end was December 2007. The Product was a desktop application which was used through a
graphical user interface developed with Java Swing technology.

9.2 Research Method


The Project under study was decided before the actual research method was chosen. When the role of
the researcher became clear, there were two qualitative approaches to select from; case study and ac-
tion research. It was clear from the beginning that the researcher would be highly involved with the
Project under research. This high involvement with the Project prevented choosing case study for the
research method. Action research was more suitable for this research. Unlike other research methods
where the researcher seeks to study organizational phenomena but not to change them, the action re-
searcher is concerned with creating organizational changes and simultaneously studying the process
(Babüroglu & Ravn 1992). This describes pretty well the situation with this research. Researcher was
participating, giving trainings and helping to define the actions that would change the existing process.

47
While the research method was chosen, it was also kept in mind that one purpose of the research was
to try out the acceptance test-driven development in practice. There was a demand for a method that
would enable a practical approach to the problem. Avison et al. (1999) define that action research
combines theory and practice (and researchers and practitioners) through change and reflection in an
immediate problematic situation within a mutually acceptable ethical framework. This was another
reason why action research was chosen to be the method for this research.

According to Avison et al. (1999), action research is an iterative process involving researchers and
practitioners acting together on a particular cycle of activities, including problem diagnosis, action in-
tervention, and reflective learning. The iterative process of action research suited well with the iterative
process of Scrum. The research iteration length was chosen to be the same as the length of the Scrum
iterations. Figure 22 shows how these two processes were synchronized. With this arrangement the
research cycle was quite short, but it also helped to concentrate on small steps in changing the process.
It also helped to prioritize the most important steps.

Figure 22: Action research activities and the Scrum process

48
A management decision to increase the amount of automated testing was made before the research pro-
ject started. This decision was also a trigger for starting this research. Stringer (1996) mentions that
programs and projects begun on the basis of the decisions and the definitions of authority figures have
a high probability of failure. This was taken into account in the beginning of the research and led to a
different starting phase than what was defined in Stringer (1996) about defining the problems and de-
fining the scope and actions based on that problem definition. Because the goal was already defined,
the research started from collecting data about the environment and implementing the new acceptance
test-driven development process. Otherwise, the action research method defined in Stringer (1996) was
used.

9.3 Data Collection


There were two purposes for the data collection. The first purpose was to collect data about problems
and benefits that individual project members confronted and noticed during the Project. The other pur-
pose was to record the agreed implementation of the acceptance test-driven development and also to
observe how this agreement was actually implemented. The latter was even more important as Avison
et al. (1999) mentioned that, in action research, the emphasis is more on what practitioners do than on
what they say they do.

The data was collected with observations, informal conversations, semi-formal interviews and collect-
ing meaningful emails and documents. The data was collected during four months period from the be-
ginning of January 2007 to the end of April 2007. The researcher worked in the Project as a test auto-
mation engineer. The observations and the informal conversations were conducted when working in
the Project. One continuous method to collect relevant issues was recording the issues raised in the
daily scrum meetings.

Initial information collection based mainly on informal discussions but also a few informal interviews
were used. The main purpose of initial information collection was to build an overall understanding of
the Project and a deep understanding about the testing in the project. This was done by asking ques-
tions about the used software processes, software development and testing practices, and problems en-
countered in these issues. Some interviews contained also questions about the Project’s history.

49
The final interviews were semi-formal interviews meaning that the main questions were pre-defined
but questions derived from the discussion were also asked. Nine persons were interviewed. Interview-
ees consist of two developers, two test engineers, two feature owner/usability specialist, one feature
owner, one scrum master and one specification engineer. All these persons had participated more or
less in developing the features with ATDD. The final interviews in the end of the research focused
more on the influences of acceptance test-driven development on different software development as-
pects. Appendix B contains the questions asked in the final interviews. The interview questions were
asked in order presented in the appendix, and the objective was to lead respondents’ answers as little as
possible. Specifying questions were asked to get reasoning to the answers. The interviews were both
noted done and tape-recorded.

50
10 ACCEPTANCE TEST-DRIVEN DEVELOPMENT WITH
KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK IN
THE PROJECT UNDER STUDY
This chapter describes what was done in the Project where the acceptance test-driven development was
tried out. The emphasis is on issues that are relevant from the acceptance test-driven development
point of view. At first the development model and practices used in the project are described. The case
project was described in Chapter 9.1. Then it is illustrated how the keyword-driven test automation
framework was used in the Project. The emphasis is on the four areas mentioned in the second research
question in Chapter 8.2. At the end of this chapter the results of the final interviews are presented.

10.1 Development Model and Development Practices Used in the


Project
The development process used in the Project was Scrum. Scrum was introduced and taken into use at
the beginning of the project. That meant that the adjustment to the process was ongoing at the time of
the research. There were also some differences to Scrum presented in Chapter 3.3. The biggest differ-
ence was the format of the product backlog. The main requirement types in the Project were the re-
quirements defined in the requirement specifications and workflows. A workflow contained all the
steps that user could do with the functionality. The workflow was a high level use-case. It contained
multiple steps that were related to each other. These steps were divided into mandatory and optional
steps. Every step in the workflow could be seen as a substitute for an item in the product backlog.

As was mentioned in Chapter 3.3, Scrum does not define development practices other than the daily
build. In the Project continuous integration was used. There were no rules defining which development
practices should be used during the Project. Extreme programming practices like refactoring were used
from time to time by the development team. The developers created unit tests, and there were targets
for the unit testing coverage. However, the unit tests were not done using test-driven development.
Main details of the features were written down to feature descriptions which were short verbal descrip-
tion of the feature. During the research project, the testing division to the automated acceptance test,
automated unit test and manual exploratory testing was taken into use (see Chapter 4.2).

51
The test automation with Robot was started in September 2006. At the beginning of the Project the
automated test cases were created for the already existing functionality. This automation task was done
by a separate test automation team. At the time the research was started, automated test cases covered
most of the basic functionalities. This meant that the library to access the graphical user interfaces of
the Product was already developed for some time, and it included the base keywords for most of the
Java Swing components. At this stage there was a desire to create the automated test cases for features
during the same sprint. To make this possible the acceptance test-driven development was taken into
use.

10.2 January Sprint


In the first research sprint the goal was to start acceptance test-driven development with a few new fea-
tures. At first it was problematic to find features to be developed with acceptance test-driven develop-
ment. As part of the implementation was follow-on to the implementation of the previous sprints.
These features were seen problematic to start with. Some of the new features needed internal models,
and while being developed, they could not be tested through the user interface. Finally, one new fea-
ture was chosen to be the starting point. The feature was the map layer handling. The map layer han-
dling is used to load backgrounds to a map view of the Product. Network elements and information
about the network is shown on the map view.

As mentioned, there was a separate team for the test automation when the research started. To be able
to work better in the scrum teams, the test automation team members started working as team members
in the scrum teams. This was done at the beginning of the sprint.

PLANNING

The test planning meeting for the map layer handling feature was arranged by a test engineer. It took
place at the middle of the sprint, before the developer started implementing the feature. The partici-
pants of the meeting were a usability expert/feature owner, developer, test engineer and test automation
engineer.

52
The meeting started with a general discussion about the feature to be implemented, and the developer
draw a sketch about the user interface he had in mind. After the initial sketch, the group started to think
about the test cases; how the user could use the feature, and which kind of error situations should be
handled. The sketch was updated based on the noticed needs. The test engineers wrote down test cases
when ever those were agreed. During the discussions some important decisions were made about sup-
ported file formats and supported graphic types. At the end of the meeting, the agreed test cases were
gone through to make sure that all of those were written down. At this phase the test cases were not
written in any formal format.

IMPLEMENTATION

The test case implementation started by writing the test cases agreed in the planning meeting to the
tabular format. At the same time, the developer started the development. Figure 23 contains some of
the initial test cases. The highest level of abstraction was not used in the test cases, and therefore the
test cases consist of lower level user keywords with short names and variables. These test cases remind
more the test cases the test automation team had implemented earlier than the test cases shown in the
example in Chapter 7.2 and Figure 13.

53
Figure 23: Some of the initial acceptance test cases for map layer handling

The implementation of the user keywords started after the test cases were written. There was a need to
implement multiple base keywords even though the library had been developed for some months. For-
tunately, the test automation engineer had time to create these needed keywords. At this stage the iden-
tifiers needed to select the correct widgets from the user interface were replaced with variables. The
variable values were set when the developer had written identifiers to the code and emailed those to the
test engineer.

From the beginning it was clear that verifying that the map layers are drawn and shown correctly on
the map would be hard to automate. It was not seen sensible to create a library for verifying the cor-
rectness of the map, and therefore a substitutive solution was created. The solution was to take screen-
shots, and combine those with instructions defining what should be checked from the picture. This lead
to manual verification, but doing that from time to time was not seen as a problem.

54
One of the tested features was changing the colors of the map layers. A base keyword for changing this
color was created, but when it was tried out, it was not working. After the problem was investigated by
the developer and test automation engineer, the base keyword implementation was noticed to be incor-
rect. However, the changes made to the base keyword were not correcting the problems, and one more
problem was noticed in the application. These problems were technical test automation problems. The
investigations took some time and the color changing functionality could not been tested in this sprint
by automation. Also some parts of the feature were not fully implemented, and those were moved to
the next sprint.

TEST EXECUTION

The test cases were executed on the test engineer’s and the test automation engineer’s workstations
during the test case implementation phase. There were problems to get a working build during the
sprint, and that slowed down the ability to test that the test cases and especially the implemented base
keywords were working. During this phase the problems in test cases were corrected and defects were
reported to the developer.

REPORTING

The project had one dedicated workstation for executing the automated acceptance test after the con-
tinuous integration system had successfully built the application. The web page showing a report of the
latest acceptance tests was visible at a monitor situated in the project area. The test cases created dur-
ing the sprint were added to an automatic regression test set at the end of the sprint. Tests that were
passing at the end of the sprint were marked as regression tests with the Robot’s tagging functionality
(see Chapter 6.2). The ability to define the critical test cases based on tags made it possible to execute
all the tests even some test cases and features were not working.

10.3 February Sprint


In the second sprint the goal was to finalize the test cases for the map layer functionality and start
ATDD with a few more functionality. The functionality selected was the visualization of the Abis con-
figuration. The purpose of the feature was to collect data from multiple network elements and show the
Abis configuration based on the collected data.

55
PLANNING

Immediately after the sprint planning people involved with the visualization of the Abis configuration
feature development kept a meeting about the details of the feature. There were a feature owner, a
specification person, two developers, a test engineer, a test automation engineer, a usability expert and
a scrum master. The usability specialist had developed prototype showing how the functionality should
look like. By using this prototype as starting point, the team discussed different aspects of the feature
and asked specifying questions. The test automation engineer made notes during the meeting.

IMPLEMENTATION

Based on the issues agreed in the meeting, the initial test cases were created, and those were sent by
email to all the participated people. The test cases were created on a high level to make them more un-
derstandable and these test cases can be seen in Figure 24.

Figure 24: Initial acceptance test cases for the Abis configuration

56
After the test cases were described, the needed keywords were implemented. Figure 25 contains the
implementation of the sentence format keywords. The variables used in the keywords were defined in
the same file as these keywords. As can be seen User opens and closes Abis dialog from navigator user
keyword was used by multiple keywords, and its implementation can be seen in Figure 26. The user
keywords used to implement the User opens and closes Abis dialog from navigator user keyword con-
sists of user keywords and base keywords.

Figure 25: The highest level user keywords used to map the sentences to user keywords and variables

57
Figure 26: Lower level user keywords “User opens and closes Abis dialog from navigator” implementation

Again more base keywords were needed. However, the base keywords were not implemented into the
SwingLibrary. There was a need to implement a helper library to handle the data that was checked
from the configuration table. The configuration table contained 128 cells and the content of every cell
was wanted to be verified. The tabular test data format allowed to describe the expected output almost
in the same format as it was seen in the application. However, the expected outcome could not be de-
fined beforehand. The input for the feature was configuration data from a mobile network. In this con-
text it was hard to create all the needed network data in a way that the expected outcome would be
known and the data would be correct. In the test cases existing test data was used, and the configura-
tion view to be tested automatically was selected from the available alternatives in the existing test
data.

58
Soon after the middle point of the sprint there was a meeting where the status of the visualization of
the Abis configuration feature was checked. The feature was used by a few specialists while the scrum
team was responding to the raised questions and writing down observations. Based on this meeting and
some other informal discussions some more details were agreed to be done in the sprint. Figure 27 con-
tains some of the test cases which were added and updated after the meeting. The changes were marked
with bolded text to highlight them.

Figure 27: Some of the added and updated test cases

As was mentioned earlier, there was no easy way to automatically test the map component. However,
one of the acceptance test cases was supposed to test that the Abis view can be opened from the map. It
was not seen possible to automate the test case with a reasonable effort. The test case was still written
down and tagged with the tag manual. The manual tag made it possible to see all the acceptance test
cases that had to be executed manually. Another challenge was to keep the test cases in synchroniza-
tion with the implementation because the details were changed a few times.

The test case User Can See The Relation Between TRX And DAP visible in Figure 27 was one of the
test cases added in the middle of the sprint. The implementation of the test case could not be finished
during the sprint. The exact implementation of the feature was changed a few times, and the test case
was not implemented before the implementation details were final. The feature was ready just before
the sprint ended, and there was no time to finalize the test case. This was due to the final details were
decided so late, and different people were implementing the feature and the test case.

59
The problems in the map layer handling feature and base keywords were discussed during the sprint.
Some changes to the map layer handling functionality were agreed. These changes were mainly func-
tional changes to solve the problems with the feature itself. The acceptance test cases needed updates
due to these changes. While the test cases were updated, they were changed to include only sentence
format keywords. Some of the new test cases can be seen in Figure 28. The change was quite easy be-
cause most of the keywords were already ready and the mapping from sentence format keywords to
user keywords and variables was very straightforward.

Figure 28: Some of the updated acceptance test cases for the map layers handling functionality

60
The problem with implementing the base keyword in the previous sprint was solved soon after the test
cases were updated. There was also a need for implementing one base keyword, and again there were
some small technical problems. The problem was again with a custom component. However, this time
the problem was solved quite quickly. Some challenges in the implementation and implementing func-
tionality with a higher priority took so much time that the map layers handling functionality was not
ready at the end of the sprint and a few nasty defects remained open.

TEST EXECUTION

The test cases were executed by the test automation engineer while developing the test cases similarly
as in the previous sprint. During the sprint there were still problems with the builds. This made it
harder to check whether the test cases were working or not and when some of the features were ready.
During this sprint Abis configuration test cases found a defect from a feature which had already
worked.

REPORTING

The reporting was done in a similar way as in the previous sprint.

10.4 March Sprint


During the previous two sprints it was seen that the test automation team was too much responsible for
the test automation. The knowledge was decided to be divided more into the whole team. This meant
arranging training during the sprint. The purpose was to continue the ATDD research with other new
functionality. However, some of the team had to participate in a maintenance project during the sprint,
and the sprint content was heavily decreased.

PLANNING

The team had agreed that the details of the new functionality should be agreed on a more detailed level
in the sprint planning. Therefore the team and the feature owner were discussing on a detailed level
what should be implemented in the sprint. All the details could not be described in the first planning
meeting, and thus a second meeting was arranged. In the second meeting the feature owner, two devel-
opers, usability expert/feature owner, test engineer and test automation engineer participated. The
functionalities were gone through, and there were discussion about the details. Agreements about the
implementation details were made, and those were noted down.

61
IMPLEMENTATION

The test automation engineer was responsible for arranging the training, and therefore the test cases
were not implemented at the beginning of the sprint. After the training, a developer and the test auto-
mation engineer implemented the test cases which were not finished during the previous sprint. At this
point contents of the current sprint were decreased. All the functionality that was planned in the second
planning meeting was moved to the following sprint. The initial test cases were still created before the
sprint ended, and some of those can be seen in Figure 29.

Figure 29: Initial test cases for Abis rule

62
TEST EXECUTION

The test cases that were implemented by the developer and test automation engineer were added to the
automated test execution system immediately after they were ready. All test cases created during the
previous sprints were already there.

10.5 April Sprint


The goal in the April sprint was to continue with ATDD with the Abis analysis functionality. There
were some big changes in the beginning of the sprint. The Abis analysis workflow was wanted to be
ready at the end of the sprint. This led to combining the two teams to one big sprint team. The team
that had not worked with the Abis analysis earlier needed introduction to the functionality. The big
team made it impossible to go into such details that the acceptance tests could be updated during the
sprint planning.

PLANNING

As it was mentioned in the previous chapter, initial acceptance test cases were created during the ear-
lier sprint. After the sprint planning, the feature owner, the specification engineer and the test automa-
tion engineer went through the initial test cases and updated them. Some of the details still remained
open as the feature owner found those out later in the sprint. After the test cases were updated, they
were sent to the whole team.

IMPLEMENTATION

The implementation started immediately after the acceptance test cases were updated. The test automa-
tion engineer was writing the test cases. After some of the sentence format keywords were imple-
mented, one step needed clarification. The test automation engineer invited two usability special-
ist/feature owners and a specification/test engineer to a meeting where the different options to solve the
usability problem were discussed. After all the options were evaluated the test automation engineer
discussed with the developer and the software architect about possible solutions. The changes were
agreed to be implemented, and three developers, the usability expert/feature owner and the test auto-
mation engineer planned and agreed about the details for the feature. Based on the agreed details the
test automation engineer created the acceptance tests for the new feature. Technically the test cases
were created in a similar manner as in the previous sprints.

63
The acceptance test cases were dependent on each other because every test case was a step in the Abis
analysis workflow. This caused some problems as the first step, getting the needed data into the appli-
cation, was ready only at the last day of the sprint. A part of the test cases could not be finalized before
this data was available. It was seen too laborious to count all the needed inputs beforehand. Also one
part of the feature could not be finished during the sprint. Therefore a few test cases were not ready
when the sprint ended.

At the end of the sprint, the test engineer and the test automation engineer created some more detailed
test cases to test the Abis rule. These test cases tested different variations and checked that the rule re-
sult was correct. However, the rule was not working as it was meant to be. The developer, the feature
owner and the test automation engineer had understood the details differently. This led to a more de-
tailed discussion between these parties. It was even noticed that some of the special cases were not
handled correctly. Based on the discussion the developer and test automation engineer wrote down all
the different situations and mailed those to the feature owner. It was agreed that this kind of details
need acceptance test cases in the coming sprints.

TEST EXECUTION

Some of the test cases were verified in the developers’ development environments. One test case was
failing, and it was noticed that the feature implementation has to be improved to fulfill the require-
ments. The developers continued the implementation, and after they thought it was ready, the accep-
tance test cases were executed again, and those passed. It was seen that the feature was ready. Some
other test cases were executed in the test automation engineer’s workstation. Some problems and mis-
understandings were found, and they were reported to developers.

REPORTING

The test cases were added to the acceptance test execution environment after they were updated in the
beginning of the sprint. The idea was to make the development status visible to all via the acceptance
test report. However, all the test cases were failing most of the sprint, and only a few days before the
sprint ended, some of them passed. Even at the end of the sprint all of them were not passing.

64
It was also planned to create a running tested features diagram from the acceptance test results. How-
ever, this idea was discarded because it was seen that it would not give the correct picture from the
projects status. Some of the test cases were not acceptance test cases in a sense that those were defined
by the test engineers, not by the feature owners. This limitation could be avoided by using acceptance
tag and include only test cases with this tag to the RTF diagram. An even more important reason for
dropping the idea was the fact that the whole projects development was not done in the ATDD manner.

10.6 Interviews
This chapter collects the experiences from the project members involved in the team which developed
features with the acceptance test-driven development. The interview methods are described on a more
detailed level in Chapter 9.3. Altogether nine persons were interviewed and in this chapter the results
are briefly described. Results of the interviews are analyzed on a more detailed level in Chapter 11.

CHANGES IN THE SOFTWARE DEVELOPMENT

Interviewees thought that the biggest change due to the use of ATDD had been the increased under-
standing of the details and workflow in the whole team. One developer thought that ATDD had forced
the team to communicate and co-operate. Another developer mentioned that due to ATDD, feedback
about the features is obtained faster. The test engineers saw that they were able to affect on the devel-
oped software more than before.

BENEFITS

The biggest benefit mentioned in the interviews was a better common understanding of the details due
to the increased communication, cooperation, and detailed planning. Four interviewees saw that re-
quirements and feature descriptions are more accurate than before. One feature owner had noticed
missing details in the requirements while defining the acceptance tests. The developers thought that
they knew better what was expected from them. Three other interviewees agreed. Four interviewees
saw that the increased understanding of the details had lead to doing the right things already at the first
time. Two interviewees thought the acceptance test cases had increased the overall understanding
about the workflow. One respondent had noticed improvements in team work.

65
The test engineers thought their early involvement was beneficial because they were able to influence
on the developed software, ask hard questions and create better test cases due to the increased under-
standing. One test engineer thought that being in the same cycle with the development is very efficient
because then people remember what they have done, and therefore problems can be solved with a s-
maller effort. One feature owner was of the opinion that the test engineers and developers understand
better what to test and how to test. She also mentioned that the testing is now covering a full use case.
Three interviewees mentioned that feedback was obtained much faster than earlier. The early involve-
ment of the test engineers and test automation to helped to shorten the feedback loop. One developer
saw that the automated user interface testing is improved. One interviewee thought the automated ac-
ceptance tests keep the quality at a certain level but does not increase it. Another interviewee was of
opinion that test automation helps to reduce the manual regression testing and test engineers can con-
centrate more on the complex scenarios and use more their domain knowledge.

DRAWBACKS

There were not many drawbacks according to the interviewees. Two interviewees thought that the ini-
tial investment to test automation is the biggest disadvantage and they were wondering if the costs will
be covered in the long run. Two interviewees were of the opinion that the extra work needed to rewrite
the test cases after possible changes is a problem. One feature owner thought that the time needed to
write the initial test cases is also a kind of a drawback. Two interviewees were speculating that some
developers may not like that others come to their territory. Four interviewees could not find any weak-
nesses that are in the same dimension with the benefits.

CHALLENGES

Test data was seen as the biggest challenge and five respondents mentioned it. Flexible creation of test
data and its use in acceptance test cases were considered challenging. Also reliable automated algo-
rithm testing was seen as problematic. One developer mentioned that testing the map component and
other visual issues with automated test cases would be troublesome. Three interviewees thought that
there may be challenges with change resistance. The test engineers found that it was difficult to find
the right working methods. The increased cooperation increases the need for asking right questions,
and that can also be challenging.

66
INFLUENCE ON THE RISK OF BUILDING INCORRECT SOFTWARE

There were varying views on how ATDD influences the risk of building incorrect software. Some in-
terviewees saw two risks. The first risk was building software that does not fulfill the end customer’s
expectation. The second risk was building software that does not fulfill the requirements or the feature
owner’s expectations. Two persons saw that ATDD does not affect on the risk of building incorrect
software from the end user’s point of view. On the other hand, one test engineer thought that the early
involvement of testing may even decrease the risk. Seven interviewees saw that the second risk about
not creating software that has been specified and wanted by the internal customer had decreased com-
pared to earlier. Increased communication, discussion about the details and an increased common un-
derstanding before the implementation were seen as the main reasons. One interviewee thought that if
the test cases are incorrect and those are followed too narrowly, it may increase the risk. Another re-
sponse was that if the application is developed too much from the test automation’s point of view, the
actual application development could suffer.

VISIBILITY OF THE DEVELOPMENT STATUS

The visibility of the development status was not seen to have changed much with the use of ATDD.
One individual view was that the automated tests will increase it in the future. Another comment was
that breaking the tests into smaller parts and arranging a sprint-specific information radiator could
help. The developers thought that merging the acceptance test reports as a part of the build reports
would improve the situation.

QUALITY AGREEMENT BETWEEN THE DEVELOPMENT AND FEATURE OWNERS

Seven interviewees saw the acceptance test cases as an agreement between the development team and
feature owners because the test cases were done in cooperation. However, four of them saw that the
agreement is a functional agreement and not a quality agreement. The quality was seen as a bigger en-
tity than correct functionality. Two interviewees saw that the agreement had not yet formed.

67
CONFIDENCE IN THE APPLICATION

In general, the confidence in the application had increased. One developer saw that ATDD had en-
hanced his confidence in the software because he knew that he was developing the right features. Also
three other persons saw that confidence had grown because there was a common understanding on
what should be done. Three other interviewees were of the opinion that test automation had built the
confidence mainly because passing automated test cases indicated that the application was working on
a certain level. One interviewee saw that the automated test cases increase confidence because she
could trust that something was working after it had been shown to be working in the demo. One test
engineer saw that the possibility to affect the implementation details had enhanced his confidence in
the software.

WHEN PROBLEMS ARE FOUND

Five interviewees thought that problems can be found earlier than without using ATDD and three of
them had already experienced that. However, four of them were of the opinion that manual testing and
test engineers’ early involvement were the key issues. Two of them also mentioned that co-operation in
the early phase can prevent problems from occurring. Four interviewees had not experienced changes,
even though one of them hoped that problems could be found faster in the future.

REQUIREMENTS UP-TO-DATENESS

According to the interviewees, the requirements were more up-to-date than before. Seven of the inter-
viewees had seen improvement in the way the requirement specification and feature descriptions were
updated. One feature owner and specification engineer mentioned that some missing requirements
were noticed while creating the test cases. Increased communication between the different roles was
also seen to have helped updating the specifications. One developer and test engineer thought that if
some of the agreed functionality has to be changed during the development, it may not get updated.
Two interviewees had not seen any change compared to earlier.

68
CORRESPONDENCE BETWEEN TEST CASES AND REQUIREMENTS

Seven of the interviewees saw that the test cases and requirements are more in sync than before. Rea-
sons mentioned were cooperation in the test case creation, increased communication, better under-
standing of the feature, and agreement about the details. Two persons thought that the test cases corre-
spond better to the requirements at the beginning when the details are agreed. On the other hand, they
thought that changes during the implementation phase may lead to differences between the test cases
and requirements. One feature owner/usability expert saw that ATDD does not assure that the test
cases and requirements are in sync. He also thought that the test cases cannot replace other specifica-
tions. In his opinion, there is not even a need for that.

DEVELOPERS’ GOAL

Both the developers thought that ATDD had made it easier to focus on the essential issues. One of
them thought the acceptance test cases had also increased the understanding about where his code fits
into the bigger context. Five persons other than developers thought that the developers’ focus is more
on the right features. One interviewee hoped the developers’ goal had changed to a direction where the
feature is implemented, tested and documented, not only implemented.

DESIGN OF THE SYSTEM

One developer thought that ATDD had helped in finding the design faster than before. The other de-
veloper did not have noticed any changes in the design.

REFACTORING CORRECTNESS

The developers found that ATDD had not affected on the evaluation of the refactoring correctness yet.
However, they thought that automated acceptance tests could be used for that later on.

QUALITY OF THE TEST CASES

Most of the interviewees were of the opinion that the quality of the test cases had increased. The fol-
lowing justifications were presented; test cases are created in co-operation, test cases respond better to
the requirements, test cases cover the whole workflow, and test cases are more detailed and executed
more often. Some interviewees could not tell if there had been any changes. One developer thought
that the acceptance tests done through the graphical user interface had been a huge improvement to the
user interface testing. He explained that it had been very troublesome to unit test the user interfaces
extensively.

69
TEST ENGINEERS’ ROLE

In general, it was seen that test engineers’ role had broadened due to the use of ATDD. Most of the
interviewees mentioned that being a part of the detailed planning had been the biggest change. Other
mentioned changes were increased need to communicate and an increased role in information sharing.
The test engineers thought the change had been huge. The ability to influence on the details makes the
work more rewarding. The improved knowledge about expected details makes it possible to test what
should be done instead of testing what has been done. One feature owner thought that ATDD had
eased the test engineers’ tasks due to the fact that test cases were defined together.

Four interviewees had noticed the old confrontation between the developers and test engineers starting
to decrease due to the increased cooperation. One developer had understood better the difficulties in
testing which in turn had changed his view about the test engineers. One developer said he was happy
that the communication is not only happening through defect reports.

FORMAT OF THE TEST CASES

All the interviewees thought the test cases are at the moment in a format which is very easy to under-
stand. The sentence format was seen very descriptive. However, one developer had noticed some in-
consistency between the terminology in the test cases and requirements specification. A few persons
thought that still some domain knowledge is needed to understand the test cases. One test engineer
thought the format is much more understandable than the test cases created with traditional test auto-
mation tools.

LEVEL OF THE ACCEPTANCE TESTS

The interviewees saw it difficult to define on which level the acceptance test cases should be. One test
engineer thought that discussion at the beginning of the sprint may help to write proper acceptance test
cases and to avoid duplicating the same tests in unit testing and acceptance test levels. Two persons
thought that more detailed test cases would need better test data. One of them also mentioned that it
will not be possible to test all the combinations. He also doubted the profitability of detailed automated
test cases due to the increasing maintenance costs. One specification engineer thought that the accep-
tance test cases have probably been detailed enough, but more experiences are needed to become con-
vinced. Other interviewees did not have any views on this issue.

70
EASINESS OF TEST AUTOMATION

Most of the interviewees did not know if ATDD had affected the easiness of test automation. One test
engineer thought that ATDD helps to plan which test cases to automate and which not.

IMPROVEMENT IDEAS

The interviewees did not have any common opinion on improvement areas. One interviewee thought
that increasing the routine is the most important thing to concentrate on because the method had been
used only for a short time. One feature owner saw that in some areas there is a need for more detailed
level acceptance tests. She also mentioned that there could be a check during the sprint where the ac-
ceptance test cases are reviewed.

Both the developers thought that reporting could be improved to shorten even more the feedback loop.
Adding the acceptance test reports to the build reports was seen as a solution. One of the developers
thought that the written acceptance test cases could be communicated so that everyone really knows
those test cases exist. One feature owner/usability specialist was of the opinion that splitting the accep-
tance test cases into smaller parts would help to follow the progress inside the sprint. He felt that
smaller acceptance tests with sprint-specific reporting could be used to improve the visibility to all pro-
ject members.

One test engineer saw that there is room for improvement in defining and communicating what is
tested with manual exploratory tests, automated acceptance tests, and automated unit tests. Two re-
spondents thought that more specific process description should be created to ease the process adapta-
tion if ATDD would be taken to wider use. It was also seen that the whole organization is needed to
support the change.

71
11 ANALYSES OF OBSERVATIONS
In this chapter the observations made during the study, including the interviews, are analyzed against
the research questions presented in Chapter 8.

11.1 Suitability of the Keyword-Driven Test Automation Framework


with Acceptance Test-Driven Development
The first research question was: Can the keyword-driven test automation framework be used in the ac-
ceptance test-driven development? This question was divided into two more specific questions and
those are analyzed first. After the specific questions have been covered, the analysis of the actual re-
search question is presented.

IS IT POSSIBLE TO WRITE THE ACCEPTANCE TESTS BEFORE THE


IMPLEMENTATION WITH THE KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK?

In the Project the test cases were written in two phases. The initial test cases were written based on the
information gathered from the planning meetings. Writing the initial test cases took place after the
planning and those were usually ready before the developers started implementing the features. There-
fore, it can be said that the initial test cases were written before the implementation started. However, it
has to be taken into account that the initial test cases were on a high level and the amount of test cases
was only between 10 and 25 test cases per sprint. In case there had been more test cases or the test
cases had been on a more detailed level, the result might have been different.

The second phase, implementing the keywords that were needed to map the initial test cases to the sys-
tem under test, was conducted in parallel with the application development. With some test cases, it
was not possible to implement all the keywords before the actual implementation details were decided.
There were also difficulties with implementing the test cases with the inputs and outputs dependent on
the features under development. There were also problems with implementing the base keywords. This
prevented finalizing some of the test cases during the sprint. Therefore, only some of the acceptance
test cases were fully ready before the corresponding feature. It was noticed that the test cases could be
implemented neither before the development nor before the features were ready. However, the test
cases were mainly ready soon after the features.

72
The reasons behind the test case implementation problems had to be analyzed. The first problem was
that the interface between test cases and application was changing. It was not possible to implement the
test cases before the interface was defined, which is obvious. However, the test cases were not imple-
mented even immediately after the interface was clear. This was due to the fact that different persons
were implementing the test cases and the features. In case the same person had implemented both, the
test cases could have been created on time. This problem has also something to do with the tool and
approach used to automate the test cases. If the interface had been a programmatic interface, the devel-
opers would have been forced to create the needed code to map the test cases and application. In this
case, the changes in the interface would have been just one person’s responsibility. Therefore, it can be
said that the selected interface made this problem possible. To avoid this problem, it is possible to
move the test case implementation to the developer or improve the communication between the person
implementing the test cases and the person developing the features.

The second problem was defining the inputs and outputs beforehand. The interviewed project members
mentioned that the test data is the biggest challenge in the domain. In the Project, some expected re-
sults were calculated for verification purposes. However, in some test cases more data was needed. It
was not seen sensible to count all this data only for the sake of a few test cases. These problems can
obviously make it hard or even impossible to implement the test cases before developing the features.
On the other hand, these problems were not tool specific. It is even possible that in some other context
this kind of problems do not exist or those are at least easier to solve. However, if this kind of prob-
lems exists it has to be decided case by case whether it is worth of the extra effort to implement the test
cases in a test-first manner.

The problems with creating the base keywords were technical. These kinds of problems occur every
now and then. It was also noticed that it might be hard to implement the system specific base keywords
without trying those out. There was no specific reason for the problems, and as the knowledge about
the library increased, the amount of problems was decreasing. And more importantly, all of the prob-
lems were eventually solved.

73
IS IT POSSIBLE TO WRITE THE ACCEPTANCE TESTS IN A FORMAT THAT CAN BE
UNDERSTOOD WITHOUT TECHNICAL COMPETENCE WITH THE KEYWORD-DRIVEN
TEST AUTOMATION FRAMEWORK?

The acceptance tests were easy to understand by all the project members. The main reason for this was
that the acceptance test cases were written using plaintext sentences, in other words sentence format
keywords. However, using the sentence format keywords caused extra cost. One additional abstraction
layer was needed for the test cases. Whenever some inputs were defined in the test cases those were
given as arguments for the keyword implementing the sentence format keyword. In some cases this led
to creating duplicate data. The sentence format keyword was first converted to a user keyword and ar-
gument or arguments, and then the user keyword was mapped to other keywords. Implementing the
sentence format keywords took usually only seconds, so the cost was not relevant. This was because
the keyword-driven test automation framework supported a flexible way of defining user keywords in
the test data. Without this functionality in the keyword-driven test automation framework, it may be
harder to use the sentence format keywords and the cost may be higher. Overall the clarity gained with
the sentence format keywords in the Project was worth of the extra effort.

However, there are some doubts about the sentence format keywords’ suitability to lower level test
cases. Especially if the test cases are created in a data-driven manner, and only inputs and expected
outputs vary. In these cases the overhead caused by the extra abstraction layer may become a burden.
In these cases it would probably be better to use descriptive keyword names and add some comments
and column names to increase the readability of the test cases. This is something that needs further re-
search because the acceptance test cases created in the Project were mainly on a high level.

CAN THE KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK BE USED IN THE


ACCEPTANCE TEST-DRIVEN DEVELOPMENT?

The answer to this question is ambiguous. It depends on how strictly the acceptance test-driven devel-
opment is specified. It is clear that the acceptance test cases were not implemented before the devel-
opment. In the Project it would have been very unprofitable to implement all the test cases in a test-
first manner and probably also impossible. The strict test-first approach with acceptance test cases may
be hard in any environment, and Crispin (2005) has also noticed more problems than benefits with the
strict test-first approach. On the other hand, the initial test cases were mainly ready before the devel-
opment as was mentioned earlier. Therefore, the acceptance test cases were driving the development
by giving a direction and goal for the sprints. One developer’s comment “The acceptance test cases
really drove the development!” promotes this statement.

74
However, the test cases created with the keyword-driven test automation framework can be on a very
high level due to the ability to create abstraction layers to the test cases. This may lead to a situation
where a high level use case is converted to high level test cases, and therefore the details are not
agreed, and the benefits of ATDD evade. In the Project some of the test cases were created on such a
high level that the problems were noticed only when the test cases were implemented. At least one us-
ability problem was noticed while implementing the test cases. This could have been noticed already in
the planning phase with more detailed test cases. On the other hand, the usability problem was solved
during the sprint and without ATDD this problem would have been noticed and corrected much later.
Also some misunderstandings noticed at the end of the April sprint could have been avoided with more
detailed test cases.

It was also observed that some of the agreed acceptance test cases were not driving the developers
work as well as they could have. With some features the test automation engineer found problems that
could have been avoided if the developers had been following the test cases more strictly. These prob-
lems were not considerable, but some extra implementation was needed to fix them. These situations
were possible because the test automation engineer implemented the test cases instead of the develop-
ers. There were two reasons why the test automation engineer was implementing the test cases. First
the keyword-driven test automation framework made it possible to implement the test cases with the
keywords without programming. The other reason was the interface used to access the Product from
the acceptance test cases. Because there was a test library to access the graphical user interface of the
Product, it was possible to write the test cases without the developers’ continuous involvement. With
tools like FIT (Framework for Integrated Test) there is usually a need for implementing some feature
specific code between the test cases and application. Therefore, developers are enforced to work
closely with the test cases. However, with the keyword-driven test automation framework this in-
volvement is not forced by the tool.

Overall, it seems that the keyword-driven test automation framework can be used in the acceptance
test-driven development if the strict test-first approach is not required. However, there are a few things
that are good to keep in mind if the keyword-driven test automation framework is used with ATDD.
Creating only high level test cases should be avoided because those will not drive the discussion to the
details which were mentioned to be the biggest benefit of ATDD. If different persons are creating the
test cases and implementing the application, the communication between these two parties has to be
assured.

75
11.2 Use of the Keyword-Driven Test Automation Framework with
Acceptance Test-Driven Development
The second research question was: How is the keyword-driven test automation framework used in the
acceptance test-driven development in the project under study? This question was divided into accep-
tance test case planning, implementation, execution, and reporting. Chapter 10 already answers to
these questions, but into this chapter sprints are summarized and analyzed.

HOW, WHEN AND BY WHOM THE ACCEPTANCE TEST CASES WERE PLANNED?

There was no formal procedure for defining the acceptance test cases. The test cases were rather de-
fined on the situation basis. However, in all the cases the implementation details were discussed in a
group containing at least a developer, a feature owner, a usability specialist, and a test engineer, and
the discussion was noted down to different sketches and notepads. These discussions took place usu-
ally soon after the sprint planning and always before the implementation. After the meetings it was
mainly the test automation engineer’s task to convert the acceptance test cases to the tabular format
used with Robot. In the April sprint the acceptance test cases were updated by a group including a fea-
ture owner, a specification engineer and a test automation engineer.

Writing the test cases and details quickly down in the planning meetings was noticed to be a good
choice. The discussion was not hindered by someone writing the test cases, but all the participants
were really taking part in the conversation. However, there was one drawback with this approach. In a
few meetings, some of the details needed to implement the test cases were not discussed. This was be-
cause the issues were not handled systematically. Because these details were straightened out from in-
dividual persons, those were not fully understood by the whole team. It was noticed that emailing and
having the test cases in version control system was not enough. Therefore, it would have been benefi-
cial to have some kind of a meeting after the test cases were written to check and clarify all the details
to all the team members. This was mentioned also by two team members in the final interviews. A
similar problem was noticed in the April sprint when the details were updated without the developers.

HOW, WHEN AND BY WHOM THE ACCEPTANCE TEST CASES WERE IMPLEMENTED?

The acceptance test cases were implemented using the sentence format keywords from the February
sprint onwards in a similar manner as was explained on example in Chapter 7. The test case implemen-
tation took place parallel with the feature implementation. The test cases were implemented mainly by
the test automation engineer, but also a test engineer and a developer implemented some of the test
cases.

76
In addition to the challenges presented earlier in this chapter, there were challenges in keeping the test
cases up to date in the February sprint. This problem could have been avoided if the details had been
agreed on a more detailed level in the planning meeting. On the other hand, some of the changes were
made based on the feedback gained from the meeting arranged with the specialist. These changes
would have been very hard to foresee. However, updating the test cases was quite easy because the test
cases were created with keywords.

The biggest challenge compared to the simple example presented in Chapter 7 was the increase in test
execution time. Starting the application and importing the network data took a considerably long time.
Those actions were not wanted to be executed in every test case as the total test execution time would
have been multiplied by the amount of test cases. It was important to keep the execution time short as
it had an effect on the duration of the test case implementation, and the feedback time in the acceptance
test execution system.

HOW, WHEN AND BY WHOM THE ACCEPTANCE TEST CASES WERE EXECUTED?

The acceptance test cases were executed in three ways. During the test case implementation the test
automation engineer executed the test cases on his workstation. The purpose was to verify that the test
cases were implemented correctly. With some test cases this meant that the features were already im-
plemented at this stage. Some of the test cases were executed on the developers’ workstations during
the development by a test automation engineer and the developers. All the test cases were added to the
acceptance test execution environment. At the beginning the test cases were added to the environment
at the end of each sprint. However, in the last sprint the test cases were added to the acceptance test
execution environment immediately after the initial versions were created. In the acceptance test envi-
ronment the test cases were automatically executed whenever there were new builds available.

As already was mentioned, the problems in the acceptance test implementation prevented the develop-
ers from evaluating whether they were ready or not by running the acceptance test cases. There were
also two other reasons which made it hard for the developers to evaluate their work readiness with
automated acceptance test cases. First of all, some of the test cases tested the workflow, and therefore
those test cases were dependent on each other. That is why the test cases in a late phase of the work-
flow could not be tested before the features preceding them were working. Another reason was that the
single test cases tested multiple developers’ work, and therefore the test cases were not passing until all
the parts the test case was testing were ready.

77
Many of the mentioned problems derive from the level of the test cases. When the acceptance test
cases are on a high level, it is inevitable that those test cases test multiple features. This in turn will
lead into the problems mentioned earlier. Avoiding the dependency between steps is hard in the work-
flow test cases. Even though these problems exist, it is obvious that the end-to-end acceptance test
cases are needed. One possible solution to this problem is to divide the acceptance test cases more
strictly into two categories. Higher level test cases could be traditional system level test cases contain-
ing end-to-end test cases. The feature specific test cases could be then integration and system level test
cases concentrating only on one feature. The feature specific test cases could be executed by develop-
ers to evaluate the features readiness. Of course this will not remove the problems that some of the fea-
tures can not be tested before pre conditional features are ready. This would also make it easier for the
developers to implement the acceptance test cases. The higher level test cases could then still be the
testers’ responsibility as was the case in the Project.

HOW AND BY WHOM THE ACCEPTANCE TEST RESULTS WERE REPORTED?

The problems found during the test case implementation were told to the developers. The results of test
case execution in the acceptance test execution environment were visible to all the project members
through an information radiator. The problems found in the automated test execution were passed on to
the developers by the test automation team members after having investigated the problems. However,
this investigation was lengthening the feedback loop as the testers were not always available. In case
the test cases would have been implemented by the developers, the feedback loop could have been
shortened. The developers thought that the feedback loop should be shortened even though they had
experienced that the feedback loop had already been cut radically.

11.3 Benefits, Challenges and Drawbacks of Acceptance Test-


Driven Development with Keyword-Driven Test Automation
Framework
The third research question was: Does the acceptance test-driven development with keyword-driven
test automation framework provide any benefits? What are the challenges and drawbacks? Based on
the experiences presented in Chapter 10.6 and the expected benefits and challenges presented in Chap-
ter 4.3, the answers to these questions are analyzed.

78
BENEFITS

The project members noticed many benefits in the use of ATDD. This was notable because the re-
search period lasted only four months. The people who worked closely with the acceptance test cases
had noticed much more benefits than those who were less involved in the use of ATDD. The role a
person represented had much less influence on the experienced benefits than the degree of involve-
ment. Of course, there were different viewpoints based on the role to some of the issues, but the main
benefits were perceived similarly by different roles. The same benefits were also notices by the re-
searcher while working in the Project.

While the research was conducted, there were some changes in the Project as was mentioned in Chap-
ter 10.1. Not all of them were related to taking ATDD into use. The changes can be categorized into
three main changes; taking test automation into use, a change towards agile testing and of course tak-
ing ATDD into use. The relations and effects of these changes on the experienced benefits had to be
analyzed. The analysis is presented next.

The main relations between the different benefits and reasoning are represented in Figure 30. As can
be seen in the figure, quite many relations between the benefits can be found. The figure is only a sim-
plified view of the benefits and their relations, but it is used as a basis of this analysis.

79
Figure 30: The relations between the changes and benefits

80
One of the sensed changes was the increased communication. As was mentioned in Chapter 4.1, agile
testing emphasizes face-to-face communication. When ATDD is in use, the work needed to create the
test cases forces to communication. The perceived increase in the communication can also be depend-
ent on the tester as some people communicate more actively than others. Therefore, it is impossible to
say how much of the increased communication was due to the use of ATDD and how much due to the
other changes. The test engineers’ early involvement can be seen as a consequence of taking the agile
testing into use. On the other hand, the use of ATDD forced the testers to take a part in an earlier phase
of the development as the testers participated in the detailed planning. Therefore, most of the benefits
gained due to the testers’ earlier participation were obtained because of the use of ATDD (see Figure
30). Co-operation in acceptance test case creation is also something that is a part of the agile testing.
However, in the Project it was due to the use of ATDD that the acceptance test cases were created with
the feature owners. Therefore, it is hard to say whether the benefits could be gained without the use of
ATDD. Anyway, the use of ATDD assures that acceptance test cases are created in co-operation and
therefore the benefits relative to it are gained.

The only practice that was taken into use purely due to the use of ATDD was detailed planning done
by the feature owners, developers and testers (bolded in Figure 30). This was one of the biggest rea-
sons leading to an improved common understanding about the details, which was seen in the Project as
the biggest benefit of the use of ATDD. Crispin (2005) also stated that the cooperation between the
groups before development was the biggest benefit of ATDD. The need to create the test cases forces
to discussion. Of course, the detailed planning could be done without ATDD and some of the men-
tioned benefits could still be gained. However, as can be seen in Figure 30, the benefits are sums of
multiple factors, and it is hard to say which benefits would be gained if only the detailed planning
would be used. As mentioned earlier, an increased common understanding and benefits following from
that can be missed if the test cases are on a too high level and the planning is not detailed enough.

The test automation affected only a few observed benefits as can be seen in Figure 30. This suggests
that the tool used in the ATDD is not relevant as most of the benefits were gained from well-timed
planning done by people working in the different roles. However, the role of the test automation pro-
viding the feedback and helping the regression testing should not be undervalued. The benefits of auto-
mated regression testing were probably not broadly highlighted in the research because of the short
research period. With a longer follow-up period, this benefit could have been greater.

81
The increased common understanding, the biggest benefit of the use of ATDD, does not provide addi-
tional value as such. However, the increased understanding provides the “real” benefits. The most
valuable benefits of the use of ATDD are therefore the decreased risk of building incorrect software
and the increased development efficiency as the problems can be solved with a smaller effort, and the
features are done right at the first time. The change in the tester’s role is also quite remarkable.

The use of ATDD affects also to the software quality. As the risk of building incorrect software is de-
creased, it is more likely that the created features will satisfy the end user’s needs. A better understand-
ing, improved test cases, and fact that the problems are found earlier should also improve the possibil-
ity to find the defects with a significant impact. However, this remains to be seen. The test automation
as a part of ATDD provides a certain level of quality. As the regression testing is done automatically,
the testers hopefully have more time to explore the system and find defects. In the Project the non-
functional testing was not taken into account when the acceptance test cases were created. However, it
was discussed to be one area to expand the use of ATDD to. Therefore, the non-functional qualities
were not improved by the use of ATDD.

The benefits mentioned are at least partially gained because the use of ATDD. If the agile testing, test
automation, and increased communication are removed from the relations, none of the real benefits
evade. Of course, that may influence the magnitude of the benefit.

BENEFITS NOT PERCEIVED

There were also areas where benefits were not noticed even though those areas were mentioned as pos-
sible benefit areas in the literature (see Chapter 4.3). Possible reasons for why the benefits were not
gained are analyzed here.

82
Development Status Was Not More Visible

There were no changes in the development status visibility even though the acceptance test report was
available to everyone through the information radiator and the web page. At the beginning of the re-
search the test cases were added to the acceptance test execution environment at the end of each sprint.
Therefore, it was clear that the development status could not be followed inside the sprints. At the last
sprint of the research period the acceptance test cases were added to the acceptance test execution envi-
ronment at the beginning of the sprint. However, this did not help as the test cases were failing most of
the sprint. There were three reasons for that. The test cases were high level test cases testing multiple
parts of the Product in one test case. Therefore, even the development team was able to finish some
single features the test cases were still failing. Another reason was that the features were ready at a
very late phase of the sprint if even then. Therefore, the test cases were actually describing the devel-
opment status even though the people did not see failing tests as progress indicators. The third reason
was that not all of the acceptance test cases were ready at the same time the features were. The reasons
behind this problem were analyzed in Chapter 11.1.

The development status visibility could be improved by dividing the development status follow-up into
a project level and a sprint level progress. The division to the higher level and feature level test cases
presented in Chapter 11.2 could be exploited. The higher level test cases could be used to indicate
which workflows are working, and therefore those could provide the project level status. The feature
level test cases could be used to follow-up the progress inside the sprints.

Requirements Were Not Defined More Cost-effectively

The test cases were not substituting the requirement specifications in the Project. Therefore the re-
quirements and test cases were not created more cost-effectively. One clear reason was that the Project
had been started before ATDD was tried out and a requirement specification was already created. Even
if ATDD had been started at the beginning of the Project, the requirement specification would proba-
bly still have been created. One interviewed person also mentioned that there is no need for replacing
the requirements with the test cases. On the other hand, keeping duplicate data up-to-date can be seen
as a burden.

83
No Remarkable Changes to System Design

ATDD did not cause remarkable changes to the system design even though one developer thought that
he had found the design faster in some cases. A relatively short research period may be one reason why
the changes were not noticed. However, there might be other reasons as well. Reppert (2004) reported
that remarkable improvements in system design were seen as ATDD was used in some project. It may
be that this improvement could not be noticed because the interface used to access the system from the
test cases was different. As was mentioned in Chapter 4.3 the acceptance test cases usually bypass the
graphical user interface and use straightly the internal structures. This was not the case as the test cases
used the graphical user interface to access the system under test. Therefore, in the Project, there was no
need to create test code which would be interacting straightly the internal structures. This maybe was
the reason why the developers did not notice a significant change. So it seems that the interface used to
access the system under test affects whether the system design is improved or not.

Acceptance Tests Were Not Used To Verify Refactoring Correctness

Developers in the Project thought that the acceptance test cases created with ATDD could be used to
evaluate the refactoring correctness even though they had not done that yet. A longer research period is
needed to assess properly the acceptance test cases usefulness when evaluating the refactoring correct-
ness. However, it is hard to see any reasons why the acceptance test cases created with the keyword-
driven test automation framework could not be used to verify refactoring correctness. Probably the
coverage and level of the acceptance test cases have a bigger influence than the tool used to create the
acceptance test cases.

84
CHALLENGES

As was mentioned in Chapter 10.6, the main challenge in the Project’s environment was proper test
data. This however, was a domain specific testing problem. However, it was seen to affect the creation
of automated tests more than manual testing. There were also other challenges in automating the test
cases. The base keyword creation problems were described in Chapter 11.1. There were also compo-
nents in the application which could not be accessed from the automated test cases as was mentioned in
Chapters 10.2 and 10.3. As was already mentioned in Chapter 5.1, it is not an easy task to automate
testing. Test automation was also seen as one of the biggest challenges in the use of ATDD by Crispin
(2005) (Chapter 4.3). The presented test automation challenges were mainly general test automation
challenges. Some of these challenges are relative to the selected interface for accessing the application.
However, none of them were keyword-driven test automation specific. The use of ATDD and agile
testing helped to solve some of the problems easier than that could be done in a more traditional envi-
ronment. It was easier to add the needed testability hooks to the Product because the implementations
were done in parallel.

As was mentioned, test automation is a part of ATDD, but the biggest benefits can be achieved even
though not all of the test cases could be automated. However, this leads to a need to handle the manual
regression testing. Therefore, it is not advisable to be immediately satisfied with manual tests. The im-
portance of the automated regression tests in iterative software development should not be forgotten.
Of course, the scale of test automation has to be decided based on the context.

The second challenge mentioned in Chapter 4.3 was writing the tests before development. That was
also noticed in the Project as was presented in Chapter 11.1. Crispin (2005) mentioned the problem
was that there was no time to write the test cases before development. However, in the Project the
problems were more test data and context specific. Time could have been a problem in case the amount
of detailed level test cases would have been higher.

The third challenge was the right level of test cases. Crispin (2005) noticed that when many test cases
are written beforehand, the test cases can cause more confusion than help to understand the require-
ments. It was noticed in the Project that there would have been a need for test cases on multiple levels
as was mentioned in Chapter 11.2. Including also non-functional testing to a part of the acceptance test
cases in the future was seen beneficial by two interviewees. This would even widen the goal of the ac-
ceptance test cases. This challenge with the right level of test cases derives probably from the wide
definition of the acceptance testing and the possibility to create test cases on multiple test levels simul-
taneously.

85
One more challenge was noticed in the use of the keyword-driven test automation framework. As there
was not an intelligent development environment for editing the test case files and resource files, the
test data management took some time. Also some developers found it difficult to find all the keywords
that were used in the test cases and user keywords because those were defined in multiple files. These
problems with the test data management can even be bigger if there are more people implementing the
test cases.

DRAWBACKS

Interviewees mentioned only a few drawbacks. One interviewee mentioned that writing the test case
took time and it was a drawback. As there were more people defining the test cases, it took more re-
sources. On the other hand, the first versions of the test cases were written by a test automation engi-
neer, and therefore only the definitions were done with a bigger group. Two interviewees thought that
updating the test cases can be seen as rework and therefore as a drawback. This drawback was also
noticed in the February sprint. The reason was mainly that the details were not agreed well enough.
However, the time used to do the changes was not remarkable. In all, it seems that the benefits gained
from the use of ATDD exceed clearly the drawbacks.

86
11.4 Good Practices
Good practices are summarized based on literature, the observations, and the analysis of the observa-
tions, and those are shown in Table 3. These practices can be applied when acceptance test-driven de-
velopment is used.

PRACTICE EXPLANATION

Acceptance test cases are created also on a de- If the acceptance test cases are created on a too
tailed level. high level, there is no need to clarify the details,
and those remain unclear. However, creating too
many detailed test cases at the beginning of the
sprint may be confusing.

Use case/workflow test cases are discussed with It is important that all team members understand
the whole team at the beginning of the sprint. the big picture, and high level test cases can be
used to clarify that.

Detailed level test cases are discussed in small It is obviously not productive to plan all the de-
groups. tails with the whole team. Therefore, the detailed
test cases are created in small groups, where dif-
ferent roles are represented.

Test cases are written to the formal format after During the planning meetings, the test cases can
the planning meetings. be quickly noted down. The purpose of the meet-
ings is to find the needed details and create a
common understanding about those details. The
test cases can be written to a proper format after
the meeting.

Test cases are checked by the team. Because the test cases are created based on the
notes, it is good to check the test cases with the
people who planned those. This helps to find am-
biguities and to verify that all the people have
understood the details similarly.

87
The test-first approach is not mandatory. There can be situations where it is not profitable
to implement the test cases in the test-first man-
ner. However, plan and implement the test cases
on some level before implementing the feature.
Even the test case planning can help to understand
the wanted features.

Initial test cases are added to the test execution When the test cases are executed often and there
environment. are detailed level test cases, the development pro-
gress can be followed during the sprints. With the
high level test cases the development progress can
be followed on the project level.

Different kinds of acceptance test cases are cre- The acceptance test cases should cover the func-
ated. tional and non-functional requirements. There-
fore, there is a need to create different types of
test cases. Functional test cases can even be on
different testing levels.

Table 3: Good practices

88
12 DISCUSSION AND CONCLUSIONS
This research was conducted by a comprehensive literature review, action research based observations
of the use of acceptance test-driven development with the keyword-driven test automation framework
in one software development project, and interviewing members of the project in question. Results of
the research were analyzed by reflecting them to the relevant literature and to earlier studies. Conclu-
sions based on the analysis are covered in this chapter.

12.1 Researcher’s Experience


The researcher’s background and experience at the field of software testing are described briefly, so
that the reader can make some assumptions about the researcher’s competence. The researcher had four
years of experience on software testing and test automation when the research was started. The re-
searcher was a part of a team that had developed the keyword-driven test automation framework used
in the Project and called as Robot. The Robot development had lasted over a year when the research
started. The researcher had gained a lot of experience on Robot by using it for testing Robot itself.

12.2 Main Conclusions


It can be said that ATDD can provide many benefits, and it is a radical change to the traditional accep-
tance testing. ATDD together with agile testing brings testing to the core of the development opposed
to the traditional way where the main part of the testing insufficiently takes place at the end of the soft-
ware development. This is a positive feature which also improves meaningfulness of the work as all
team members can take part in planning quality software.

According to the results gained from the study, ATDD also helps to develop more efficiently software
that corresponds better to the requirements. This is mainly due to the improved common understanding
within the team about the details of the software’s features. So it seems that the use of ATDD is really
profitable.

It can be seen that the tool used to automate the test cases in ATDD does not play a crucial role as the
biggest benefits noticed based on the interviews were gained from the process. However, the level on
which the acceptance test cases are created have an influence on the gained benefits, and if the test
cases are done on a too high level, the noticed benefits evade.

89
Of course, ATDD is not a silver bullet, and challenges exist. As the acceptance testing should cover
both non-functional testing and functional testing excluding only unit testing, there is wide a area to
test. Finding the right level of tests is unquestionably hard. However, the cooperation between the team
and the customer can ease that journey.

It was acknowledged that the benefits were gained even though the acceptance test cases were not cre-
ated before the development as pure ATDD requires. This leads to the question if ATDD should be
defined so that there is no strict requirement that the test cases should be created in a test-first manner.
The discussion about the test cases is anyway driving the development as the goal of the team is to get
the acceptance test cases passing.

Based on the work, the way of thinking is that ATDD can provide a clear process to arrange the testing
in the iterative development inside the iteration, and in consequence, establish a prerequisite of a suc-
cessful testing. This can be seen very beneficial because clear guidance of the process of the agile test-
ing, especially in Scrum, is missing. The importance of this process is emphasized in environments
where the transition from traditional software development to agile software development is taking
place.

12.3 Validity
There is no one clear definition what validity means in qualitative research (Golafshan 2003, Trochim
2006, Flick 2006). However, Flick (2006) summarizes that validity answers to the question whether
the researchers see what they think they see. Flick (2006) also suggests using triangulation as a method
for evaluating the qualitative research. Based on that suggestion, the validity of this research is evalu-
ated using data and investigator triangulation. Theory and methodological triangulation are not used
because of the practical nature and predefined scope of the research. Also other matters affecting the
results of this thesis are considered.

The validity of the data was ensured by collecting data with different data collection methods listed in
Chapter 9.3. Data was also collected during the whole research, increasing the validity of the data. To
prevent unbalanced view, the researcher interviewed and observed people in different roles. Investiga-
tor triangulation means using more than one researcher to detect and prevent biases resulting from the
researcher as a person. It was not possible to use any other interviewer or observer in this research.
From this point of view the validity of the research is questionable.

90
The high involvement in the Project and especially the help the Project gained from the researcher dur-
ing the study affects the validity of this research. Kock (2003) mentions that in action research the re-
searcher’s actions may strongly bias the results. The researcher became aware of this possibility in the
beginning of the research, and it was kept in mind during the Project and especially during the analyz-
ing phase.

In the addition the background, the know-how and opinions of the researcher are possible sources of
error. This is mainly due to the fact that this research was a qualitative research, and for example, the
interviews were used as a research method. Therefore, the content and form of the interview questions
can reflect researcher’s own background, knowledge, and views. As a part of the project team, there is
no possibility for the researcher to be completely objective. However, it can be discussed whether this
subjectivity has a negative impact on the research or not.

Interpreting results is not completely an objective action. Therefore, it might be that another researcher
with a different background may have interpreted the gained results in a slightly different way. There-
fore, it must be kept in mind, that for example the conclusions are always a somewhat subjective view
of the reality. However, it can be argued that the results gained from the research would have been
similar even though the research had been carried out by some other researcher.

Also the fact that there were other changes in the Project such as a change towards agile testing may
have caused problems in understanding what actually caused the perceived benefits. However, as was
noticed in the analysis of the research results, some of the changes and benefits originate directly from
the use of ATDD. To be sure about the benefits, the subject should be studied for a longer period of
time than what was done in this research. However, the main conclusions could be drawn also based
on the period of time used in the research. The results of earlier studies and relevant literature confirm
the noticed research results as they were mainly in line with each other.

It must be kept in mind that the results presented in this thesis are only based on one software devel-
opment project and more specifically on one team’s work. Every project has its own context specific
features. These facts and of course the structure of the team have an influence on how ATDD is used
and how it is adapted as a part of the development process. Therefore, the results can in some extent
vary according to the project in question, but the main benefits noticed should be possible to gain also
in other projects.

91
The test automation framework Robot used in this research is not open sourced, which makes it harder
to introduce the test automation concept used in this research to other projects. However, there is a
possibility that the keyword-driven test automation framework used in the study will be open sourced.

12.4 Evaluation of the Thesis


The first goal of this thesis was to investigate whether the keyword-driven test automation framework
could be used with acceptance test-driven development. It can be said that the goal was achieved. Suit-
ability of keyword-driven test automation was analyzed extensively and based on the analysis the out-
come was that it is possible to use the keyword-driven test automation framework with ATDD. It was
also noticed that some limitations exist which may in turn prevent the finalization of the test cases
prior to feature implementation.

One aim was to describe the use of keyword-driven test automation framework with ATDD in a way
that enables other projects to experiment the approach with similar tools. How this goal is met remains
to be seen when the results of this thesis will possible be used in other real-world software develop-
ment projects. However the aim was to describe both the fictive example (Chapter 7) and case study
(Chapter 10) in such a way that they would be widely understood.

The last goal was to study what are the pros and cons of the acceptance test driven development when
it is used with the keyword driven test automation framework. Even though the research lasted only
four months, plenty of results were collected. Based on these results it was possible to see clear bene-
fits, some challenges, and a few drawbacks. In this sense, the study was successful.

12.5 Further Research Areas


Because this thesis is one of the first studies focusing on acceptance test-driven development with the
keyword-driven test automation framework, there is need for a more extensive study of this kind of
approach in other projects and projects that use different kind of iterative processes. Also a longer re-
search period would be beneficial as the changes due to the use of ATDD are wide-ranging, and the
adaptation and adjusting the process takes time. Full-scale use of ATDD would make it possible to
study better the effects of the test automation framework, and the running tested features metric’s suit-
ability with ATDD.

92
As was noticed, the level of acceptance tests affects the benefits of ATDD. It was also noticed that
there is a need for acceptance test cases on different levels and that it is difficult to create test cases on
a right level. At least the following areas need more study to understand which kind of acceptance tests
would be beneficial to create:

• How do the different levels of test cases affect the different aspects of ATDD?

• How do the different levels of acceptance tests affect measuring the project, and how does
it affect the use of running tested features metric?

• How could the lower level acceptance tests created with the keyword-driven test automa-
tion framework defined in a format that can be easily understood?

• What is the relationship between the unit testing and the lower level acceptance testing?

Further research is also needed to clarify which of the benefits mentioned in this research are actually
direct results of ATDD. Therefore, the relationships between the benefits and the source of each bene-
fit should be studied.

One issue that was not studied in this research was the ability to substitute the requirement specifica-
tions with the acceptance test cases. As one interviewee mentioned, there is no need to replace the re-
quirements with the acceptance test cases. However, some of the details in the requirement specifica-
tions could be defined with test cases to avoid maintaining duplicate data. This could lead to linking
the high level requirements to the acceptance test cases. This would be an interesting area for further
study.

Altogether, it can be said that this thesis is a good opening for discussion in this field of software test-
ing.

93
BIBLIOGRAPHY
Abrahamsson, Pekka, Outi Salo, Jussi Ronkainen & Juhani Warsta (2002). Agile Software
Development Methods: Review and Analysis. VTT Publications 478, VTT, Finland.
<http://virtual.vtt.fi/inf/pdf/publications/2002/P478.pdf>

Agile Advice (2005). Information Radiators, May 10, 2005.


<http://www.agileadvice.com/archives/2005/05/information_rad.html> May 14th, 2007

Andersson, Johan, Geoff Bache & Peter Sutton (2003). XP with Acceptance Test-Driven
Development: A Rewrite Project for a Resource Optimization System. Lecture Notes in Computer
Science, Volume 2675/2003, Extreme Programming and Agile Processes in Software Engineering,
180-188, Springer Berlin/Heidelberg.
<http://www.carmen.se/research_development/articles/ctrt0302.pdf>

Astels, David (2003). Test-Driven Development: A Practical Guide. 562, Prentice Hall PTR, United
States of America.

Avison, David, Francis Lau, Michael Myers & Peter Axel Nielsen (1999). Action Research: To make
academic research relevant, researchers should try out their theories with practitioners in real situations
and real organizations. COMMUNICATIONS OF THE ACM, January 1999/Vol. 42, No. 1, 94-97.

Babüroglu, Oguz N. & Ib Ravn. Normative Action Research (1992). Organization Studies Vol. 13, No.
1, 1992, 19-34.

Bach, James (2003a). Agile test automation. <http://www.satisfice.com/articles/agileauto-paper.pdf>


March 31st, 2007

Bach, James (2003b). Exploratory Testing Explained v.1.3 4/16/03.


<http://www.satisfice.com/articles/et-article.pdf> March 31st, 2007

Beck, Kent (2000). Extreme Programming Explained: Embrace Change. Third Print, 190, Addison-
Wesley, Reading (MA).

94
Beck, Kent, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler,
James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C.
Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland & Dave Thomas (2001a). Manifesto for Agile
Software Development. <http://agilemanifesto.org> December 5th, 2006

Beck, Kent, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler,
James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C.
Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland & Dave Thomas (2001b). Principles behind the
Agile Manifesto. <http://agilemanifesto.org/principles.html> March 31st, 2007

Beck, Kent (2003). Test-Driven Development By Example. 240, Addison-Wesley.

Beizer, Boris (1990). Software testing techniques. Second Edition, 550, Van Nostrand Reinhold, New
York.

Burnstein, Ilene (2003). Practical Software Testing: a process-oriented approach. 709, Springer, New
York.

Buwalda, Hans, Dennis Janssen & Iris Pinkster (2002). Integrated Test Design and Automation: Using
the TestFrame Method. 242, Addison Wesley, Bibbles Ltd, Guildford and King’s Lynn, Great Britain.

Cohn, Mike (2004). User Stories Applied: For Agile Software Development. 268, Addison-Wesley.

Cohn, Mike (2007). User Stories, Agile Planning and Estimating. Internal Seminar, March 24th, 2007.

Control Chaos (2006a). What is Scrum? <http://www.controlchaos.com/about/> September 26th, 2006

Control Chaos (2006b). XP@Scrum. <http://www.controlchaos.com/about/xp.php> September 26th,


2006

Craig, Rick D. & Stefan P. Jaskiel (2002). Systematic Software Testing. 536, Artech House Publishers,
Boston.

Crispin, Lisa, Tip House & Wade Carol (2002). The Need for Speed: Automating Acceptance Testing
in an eXtreme Programming Environment. Upgrade, The European Online Magazine for the IT
Professional Vol III, No. 2, April 2002, 11-17. <http://www.upgrade-cepis.org/issues/2002/2/up3-
2Crispin.pdf>

95
Crispin, Lisa & Tip House (2005). Testing Extreme Programming. Second Print, 306, Addison-
Wesley.

Crispin, Lisa (2005). Using Customer Tests to Drive Development. METHODS & TOOLS, Global
knowledge source for software development professionals, Summer 2005, Volume 13, number 2, 12-
17. <http://www.martinig.ch/PDF/mt200502.pdf >

Cruise Control (2006). Cruise Control, Continuous Integration Toolkit.


<http://cruisecontrol.sourceforge.net/> Sebtember 23rd, 2006

Dustin, Elfriede, Jeff Rashka & John Paul (1999). Automated Software Testing: introduction,
management, and performance. 575, Addison-Wesley.

Fenton, Norman E (1996). Software metrics : a rigorous and practical approach. Second Edition, 638,
International Thomson Computer Press, London.

Fewster, Mark & Dorothy Graham (1999). Software Test Automation, Effective use of test execution
tools. 574, Addison-Wesley.

Flick, Uwe (2006). An Introduction to Qualitative Research. Third Edition, 443, SAGE, London.

Golafshani, Nahid (2003). Understanding Realibility and Validity in Qualitative Research. The
Qualitative Report Vol. 8, Number 4, December 2003, 597-607. <http://www.nova.edu/ssss/QR/QR8-
4/golafshani.pdf>

Hendrickson, Elisabeth (2006). Agile QA/Testing. <http://testobsessed.com/wordpress/wp-


content/uploads/2006/11/agiletesting-talk-nov2006.pdf> April 10th, 2007

IEEE Std 829-1983. IEEE Standard for Software Test Documentation. Institute of Electrical and
Electronics Engineers, Inc., 1983.

IEEE Std 1008-1987. IEEE standard for Software Unit Testing. Institute of Electrical and Electronics
Engineers, Inc., 1987.

IEEE Std 610.12-1990. IEEE standard glossary of software engineering terminology. Institute of
Electrical and Electronics Engineers, Inc., 1990.

96
ISO Std 9000-2005. Quality management systems - Fundamentals and vocabulary. ISO Properties,
Inc., 2005

ISO/IEC Std 9126-1:2001. Software engineering -- Product quality -- Part 1: Quality model. ISO
Properties, Inc., 2001

ISTQB (2006). Standard glossary of terms used in Software Testing Version 1.2 (dd. June, 4th 2006).
<http://www.istqb.org/fileadmin/media/glossary-current.pdf> April 9th, 2007

Itkonen, Juha, Kristian Rautiainen and Casper Lassenius (2005). Toward an Understanding of Quality
Assurance in Agile Software Development. International Journal of Agile Manufacturing, Vol. 8, No.
2, 39-49.

Jeffries, Ronald E. (1999). Extreme Testing, Why aggressive software development calls for radical
testing efforts. Software Testing & Quality Engineering, March/April 1999, 23-26.
<http://www.xprogramming.com/publications/SP99%20Extreme%20for%20Web.pdf>

Jeffries, Ron, Ann Andersson & Chet Hendrickson (2001). Extreme Programming Installed. 265,
Addison-Wesley, Boston.

Jeffries, Ron (2004). A Metric Leading to Agility 06/14/2004.


<http://www.xprogramming.com/xpmag/jatRtsMetric.htm> November 18th, 2006

Jeffries, Ron (2006). Automating “All” Tests 05/25/2006.


<http://www.xprogramming.com/xpmag/AutomatedTesting.htm> April 14th, 2007

Kaner, Cem, Jack Falk & Quoc Nguyen (1999). Testing Computer Software. Second Edition, 480,
Wiley, New York.

Kaner, Cem, James Bach, Bret Pettichord, Brian Marick, Alan Myrvold, Ross Collard, Johanna
Rothman, Christopher Denardis, Marge Farrell, Noel Nyman, Karen Johnson, Jane Stepak, Erick
Griffin, Patricia A. McQuaid, Stale Amland, Sam Guckenheimer, Paul Szymkowiak, Andy Tinkham,
Pat McGee & Alan A. Jorgensen (2001a). The Seven Basic Principles of the Context-Driven School.
<http://www.context-driven-testing.com/> December 19th, 2006

Kaner, Cem, James Bach & Bret Pettichord (2001b). Lessons Learned in Software Testing: A Context-
Driven Approach. 286, John Wiley & Sons, Inc., New York.

97
Kaner, Cem (2003). The Role of Testers in XP.
<http://www.kaner.com/pdfs/role_of_testers_in_XP.pdf> November 18th, 2006

Kit, Edward (1999). Integrated, effective test design and automation. Software Development, February
1999, 27–41.

Kock, Ned (2003). Action Research: Lessons Learned From a Multi-Iteration Study of Computer-
Mediated Communication in Groups. IEEE Transactions on Professional Communication, Vol. 46, No.
2, June 2003, 105-128.

Larman, Craig (2004). Agile & Iterative Development: A Manager’s Guide. 342, Addison-Wesley.

Larman, Craig (2006). Introduction to Agile & Iterative Development. Internal Seminar, December
14th, 2006.

Laukkanen, Pekka (2006). Data-Driven and Keyword-Driven Test Automation Frameworks. 98,
Master’s Thesis, Software Business and Engineering Institute, Department of Computer Science and
Engineering, Helsinki University of Technology.

Mar, Kane & Ken Schwaber (2002). Scrum with XP.


<http://www.informit.com/articles/article.asp?p=26057> October 4th, 2006

Marick, Brian (2001). Agile Methods and Agile Testing. <http://www.testing.com/agile/agile-testing-


essay.html> November 15th, 2006

Marick, Brian (2004). Agile Testing Directions. <http://www.testing.com/cgi-


bin/blog/2004/05/26#directions-toc> November 15th, 2006

Meszaros, Gerard (2003). Agile regression testing using record & playback. Conference on Object
Oriented Programming Systems Languages and Applications, Companion of the 18th annual ACM
SIGPLAN conference on Object-oriented programming, systems, languages, and applications, 353–
360, ACM Press, New York. <http://delivery.acm.org/10.1145/950000/949442/p353-
meszaros.pdf?key1=949442&key2=5537216711&coll=&dl=ACM&CFID=15151515&CFTOKEN=61
84618>

Miller, Roy W. & Christopher T. Collins (2001). Acceptance testing. XP Universe, 2001.
<http://www.xpuniverse.com/2001/pdfs/Testing05.pdf> April 10th, 2007

98
Mosley, Daniel J. & Bruce A. Posey (2002). Just Enough Software Test Automation. 260, Prentice Hall
PTR, Upper Saddle River, New Jersey, USA.

Mugridge, Rick & Ward Cunningham (2005). Fit for Developing Software: Framework for Integrated
Tests. 355, Prentice Hall PTR, Westford, Massachusetts.

Nagle, Carl J. (2007). Test Automation Frameworks.


<http://safsdev.sourceforge.net/DataDrivenTestAutomationFrameworks.htm> April 14th, 2007

Patton, Ron (2000). Software Testing. 389, SAMS, United States of America.

Pol, Martin (2002). Software testing: a guide to the TMap approach. 564, Addison-Wesley, Harlow.

Reppert, Tracy (2004). Don’t Just Break Software Make Software: How storytest-driven development
is changing the way QA, Customers, and developers work. Better Software, July/August, 2004, 18-23.
<http://www.industriallogic.com/papers/storytest.pdf>

Sauvé, Jacques Philippe, Osório Lopes Abath Neto & Walfredo Cirne (2006). EasyAccept: a tool to
easily create, run and drive development with automated acceptance tests. International Conference on
Software Engineering, Proceedings of the 2006 international workshop on Automation of software
test, 111-117, ACM Press, New York. <http://delivery.acm.org/10.1145/1140000/1138951/p111-
sauve.pdf?key1=1138951&key2=9897216711&coll=&dl=ACM&CFID=15151515&CFTOKEN=618
4618>

Schwaber, Ken & Mike Beeble (2002). Agile software development with Scrum. 158, Prentice-Hall,
Upper Saddle River (NJ).

Schwaber, Ken (2004). Agile Project Management with Scrum. 163, Microsoft Press, Redmond,
Washington.

Stringer, Ernest T. (1996). Action Research: A Handbook for Practitioners. 169, SAGE, United States
of America.

Trochim, William M.K (2006). Qualitative Validity.


<http://www.socialresearchmethods.net/kb/qualval.htm> October 4th, 2006

99
Zallar, Kerry (2001). Are you ready for the test automation game? Software Testing & Quality
Engineering, November/December 2001, 22–26.
<http://www.scionlabs.com/areureadyforautomation.pdf>

Watt, Richard J. & David Leigh-Fellows (2004). Acceptance Test-Driven Planning. Lecture Notes in
Computer Science, Volume 3134/2004, Extreme Programming and Agile Methods - XP/Agile Universe
2004, 43-49, Springer, Berlin/Heidelberg.

Wideman Max R. (2002). Wideman Comparative Glossary of Project Management Terms, March
2002. <http://maxwideman.com/pmglossary/PMG_S01.htm> May 14th, 2007

100
APPENDIX A PRINCIPLES BEHIND THE AGILE
MANIFESTO
We follow these principles:
Our highest priority is to satisfy the customer
through early and continuous delivery
of valuable software.
Welcome changing requirements, even late in
development. Agile processes harness change for
the customer's competitive advantage.
Deliver working software frequently, from a
couple of weeks to a couple of months, with a
preference to the shorter timescale.
Business people and developers must work
together daily throughout the project.
Build projects around motivated individuals.
Give them the environment and support they need,
and trust them to get the job done.
The most efficient and effective method of
conveying information to and within a development
team is face-to-face conversation.
Working software is the primary measure of progress.
Agile processes promote sustainable development.
The sponsors, developers, and users should be able
to maintain a constant pace indefinitely.
Continuous attention to technical excellence
and good design enhances agility.
Simplicity--the art of maximizing the amount
of work not done--is essential.
The best architectures, requirements, and designs
emerge from self-organizing teams.
At regular intervals, the team reflects on how
to become more effective, then tunes and adjusts
its behavior accordingly. (Beck et al. 2001b)

101
APPENDIX B INTERVIEW QUESTIONS
Interview questions asked in the final interviews.

1. How has ATDD affected the software development? Why?

2. What have been the benefits in ATDD? Why?

3. What have been the drawbacks in ATDD? Why?

4. What have been the challenges in ATDD? Why?

5. Has ATDD affected on the risk of building incorrect software? How? Why?

6. Has ATDD affected on the visibility of the development status? How? Why?

7. Has ATDD established a quality agreement between the development and feature owners?
How? Why?

8. Has ATDD changed your confidence in the software? How? Why?

9. Has ATDD affected on when problems are found? How? Why?

10. Has ATDD affected on the way requirements are up to date? How? Why?

11. Has ATDD affected on the way requirements and tests are in sync? How? Why?

12. Are the acceptance tests in a format that is easy to understand? Why or why not?

13. Is it easy to write the acceptance tests on a right level? Why or why not?

14. Has ATDD affected on the developers’ goal? How? Why?

15. Has ATDD affected on the design of the developed system? How? Why?

16. Has ATDD affected on verifying the refactoring correctness? How? Why?

17. Has ATDD affected on the quality of the test cases? How? Why?

18. Has ATDD had influence on the way people see test engineers? How? Why?

19. Has ATDD had influence on the test engineer's role? How? Why?

20. Has ATDD affected on how hard or easy the tests are to automate? How? Why?

21. What could be improved in the current way of doing ATDD? Which changes could give the
biggest benefits?

22. Sum up the biggest benefit and the biggest drawback based on the issues asked in this inter-
view and state the reasons.

102

You might also like