You are on page 1of 8

Third International Conference on Software Testing, Verification, and Validation Workshops

A Framework for GUI Testing based on Use Case Design

Cristiano Bertolini

Alexandre Mota

Center of Informatics, Federal University of Pernambuco P.O. Box 7851, 50732-970, Recife-PE, Brazil {cbertolini,acm}@cin.ufpe.br

Abstract
Today GUIs are not exclusive of desktops and web applications. They can be found in a widely variety of embedded systems such as cellular phones, TVs, cars, etc. GUI testing is an emergent approach to assure software quality. In this paper, we show how to evaluate some GUI testing techniques and the importance of controlled experiments in order to have statistical condence. Furthermore, as GUI design changes often during the development process, test cases need to be updated as well. Therefore, we also propose a general framework to GUI test case design and generation based on model-based testing and GUI capturereplay tools. The framework is easily extended to support different test case generation algorithms and script languages. It also allows one to explore functional as well as non-functional requirements, such as usability, accessibility, reliability, and so on.

1 Introduction
GUI testing is becoming very popular and several tools are being developed with the goal of identifying bugs in the internal system from the GUI as well as in the GUI itself [2, 17, 30]. Usually, tools work with model-based testing [24, 29] and model checking [7, 25] approaches. GUIs can change a lot in the development and production stage. Small changes are a frequent request from users. For example, change the position of the components, the component type, add and remove components and so on. When the GUI changes, test cases must be updated as well to guarantee the correct execution of the tests. Another GUI testing problem is the large number of generated test cases to cover a GUI application.
978-0-7695-4050-4/10 $26.00 2010 IEEE DOI 10.1109/ICSTW.2010.37 252

Automation is the key for the maintainability and efcacy of GUI testing, but it depends on the kind of requirements the testing is addressing (Section 3). Defect testing, which aims at nding bugs not directly associated to requirements, is a promising solution towards automation. However, requirements are not directly addressed. In order to extend the test case generation algorithms to support user-dened requirements, we investigate testing automation techniques by means of a proposed GUI framework. A popular approach in industry to automate test execution is capture-and-replay [22]. The idea is to record in a test script actions from the user while interacting with the GUI and then to (re-)execute this script when necessary. A well-known issue of this approach is the cost of script maintenance: changes in the GUI requires changes in the replay script. One alternative for this issue is to implement techniques that automate the construction (and execution) of new sequences [9]. However, this kind of technique is usually domain dependent and it needs specic libraries to automate the construction of the test cases and oracles to monitor the executions. Model-Based Testing is a more general approach which assumes a model (created manually or automatically) to extract test cases from such a model. Model-Based Testing is effective for GUI testing [14] because it can generate a lot of test cases automatically. The main drawback of modelbased testing is the difculty is obtaining a model of a system. One alternative is to automatically extract models from requirements. We perform this in a hidden way by using a controlled natural language (CNL) [11]. CNL is a precise subset of English originated in the research partnership between Motorola and the Informatics Center-UFPE (called CIn-BTC). Its purpose is to allow users to write precise use case descriptions in an almost natural way. Furthermore,

these descriptions are used as input to the test case generator TarGeT [10, 19]. In this paper we present a framework that includes an extensible and generalized way to design GUI requirements and test case generation (Section 4), and a guideline to evaluate black-box testing techniques statistically (Section 5). Our proposed framework deals with test automation and maintenance for GUI testing. It is able to work with different kinds of requirements. The idea is to create use cases in CNL that capture screens characteristics and behaviors. Afterwards, these use cases are mapped into GUI components in order to generate automated test scripts. The main contributions of this paper are: To propose a framework to GUI test case generation based on GUI use case design. To describe a guideline to evaluate GUI testing techniques statistically.

also apply the techniques in cellular phone applications. However, there are several interesting non-functional requirements such as reliability, usability and accessibility that can be investigated as well. To be able to investigate such aspects of a GUI based system, we need to dene specically what is the subject (non-requirement) under analysis and the respective oracle. In the next section we propose a framework that reuses a model-based tool called TaRGeT [10] to generate automated test scripts. This framework incorporates our techniques and a guideline for the design of experiments (Figure 6). It can be extended to different kinds of applications and to evaluate different test case generation algorithms.

4 GUI Framework
In this section we present a GUI framework to aid the creation of descriptions and test scripts based on GUI test case generation goals. Figure 1 shows an outline of such a framework. In Step 1, we write a use case description stating the test case generation goals based on a controlled natural language (a precise subset of English). In Step 2 we perform two main tasks: use the TaRGeT tool [10] to select the appropriate test case generation algorithm and use the selected algorithm to generate the test cases. In Step 3 the test cases generated in CNL are translated into test scripts based on a parameterized script language, resulting in partial1 GUI Test Scripts. Finally, Step 4 is responsible to complete the generated test scripts in the sense of importing needed libraries and incorporating code to start and close the test script.

2 Motivation
Black-box testing is the activity of testing without knowledge of the program organization [4]. It consists of exercising the interface of a component (typically the entire system) to nd errors. Any part of the system providing a public interface is amenable to black-box testing. Whitebox testing, in contrast, requires knowledge of the program structure. Black-box and white-box testing are recognized as complementary techniques [18]. It is worth noting that the testing team of a company can be forbidden to apply white-box testing by the application owner, say for condentiality reasons, by simply having no access to the sourcecode. In [9] we propose and evaluate several techniques to automate the generation of GUI tests without access to the source-code of the applications. We investigate techniques, which automatically generate as well as execute test cases on GUI-based systems, whose goal is crashing the system. Automated test generation is important for two main reasons: (a) manual tests can become obsolete with the evolution of an application, and (b) in many cases the quality of manual tests depends on the requirements completeness. In this work we are particularly interested in addressing the second problem. For example, manually-written (system) tests (derived from requirements) may succeed in covering common user interactions but fail to cover corner case scenarios that can crash the system [3, 15]. In this paper we also address a solution for the rst problem (Section 4).

Figure 1. GUI Framework. Figure 2 shows the main screen of TaRGeT. TaRGeT implements a systematic approach for dealing with requirements and test artifacts in an integrated way. This allows TarGeT to automatically generate test cases from use cases scenarios written in CNL (a controlled natural language). The use cases are written following an XML schema designed to contain the necessary information for generating test procedure, description and related requirements. Moreover, the tool can generate a traceability matrix between test
1 Test

3 GUI Testing Automation


In our previous efforts we focused on techniques that were specically designed to crash GUI-based systems. We

scripts with only the steps corresponding to the CNL steps.

253

cases, use cases and requirements. Three major aspects distinguish TaRGeT from other behavioral model-based testing tools: 1) the use of test purposes, provided by test engineers, to restrict the number of generated test cases and to focus on test cases that are more critical or relevant to a given task; 2) algorithms for eliminating similar test cases, reducing the test suite size without signicant impact on effectiveness; and 3) use cases written in CNL as system input, which is more natural for engineers when comparing to formal behavior specication languages.

sentence ::= verb nounPhrase nounPhrase ::= preposition article modier noun modier | article modier noun modier | modier noun modier | modier noun | noun modier | noun verb ::= send | call | select ... article ::= the | a | an preposition ::= from | to ... modier ::= at least | phonebook ... noun ::= phone | application ... Figure 3. View of the CNL syntax.

The screen is simple but the user have a lot of possibilities: ll out all elds or leave some elds empty; also different buttons can be pressed by the user. The combination of elds and actions represents several test cases. The exact number of test cases depends on the algorithm used in the test case generation.

Figure 2. TaRGeT main screen. The main purpose of the proposed framework is to provide a benchmark to evaluate new GUI testing techniques and to apply GUI testing techniques in different domains and applications. In what follows, we show a simple example of how test cases can be generated using our proposed framework. Figure 3 shows an exemplar (simplied) structure of the CNL language in the BNF format. For example, a simple CNL sentence is Select phonebook application, where a requirements document is seen as a set of CNL sentences. In our case our requirements document is the description of screens. Within the TaRGeT tool, the CNL engine checks for some inconsistencies. When a grammar error is reported, the user must adjust the corresponding CNL sentence in the use case. Another issue is when a word is not found in the CNL database, in which case the user can change the word for a synonym already available or store the new word in the database. It is important to observe that if a new GUI component is included in the use case this component must be supported by the GUI script language. Now we consider how a CNL-based use case results in test case scripts. For instance, Figure 4 shows a screen of a web application. The screen shows a web form with some elds, buttons and menu items.

Figure 4. GUI Screen Example. Figure 5 (a) describes part of the presented screen as a CNL-based use case. We have two ows: the main ow and the alternative ow; for each ow we have a user action (step description) and the system response. The Step ID is used to make each step unique in the use case. For instance, UC 01 4M is the fourth step in the main ow use case. From UC 01 4M, we can follow the current main ow. The alternative ow may be related with some step of the main ow (From Step:). Note that in the alternative ow use case, the From Step: indicates the step UC 01 4M. Thus, the ow in the step UC 01 1A is an alternative for

254

(a)

(b)

Figure 5. Use Case and two corresponding Test Cases. UC 01 4M. As one can observe in Figure 4, there are other buttons beyond (Email address). The idea is to describe these options as alternative ows in the use case (Figure 5 (a)). As we use a CNL-based language, mapping CNL sentences into a script language is performed using a (kind of) dictionary exemplied in Table 1. Basically, there are 4 columns: the verbs (actions), the nouns (usually the component name), the components that represents all possible components and the item code used to generate the test scripts. If the CNL engine reports an error (for instance, a verb was not found in the CNL database) and then the CNL database is updated or the user decides to update the CNL database for another reason, the dictionary may be updated as well and this is enforced by our framework. The mapping is made according to an automation tool. For instance, Table 1 is based on the BadBoy tool, a capturereplay tool [1] that is designed to aid in testing and developing complex dynamic applications. Basically, it can read and write GUI events in web applications. Furthermore, for other script languages, the column Item Code must be updated with respect to the corresponding script language. The mapping task is only responsible to translate CNL into the respective script language. However, it is well known that code corresponding to importing libraries, starting and closing a script, etc., is necessary as well in order the script can execute correctly. In our case this is performed automatically and implicitly following the script language standards. Figure 5 (b) shows some possible test cases from its use case (Figure 5 (a)). Test Case 1 corresponds exactly to all steps of the main ow. Test Case 2 represents a sequence of four steps of main ow and the step of the alternative ow. Many other test cases could be generated in order to guarantee a better test coverage, for example, test cases that not ll out all elds. The oracle can be generated from the System Response or from the user if a more elaborated oracle may be provided. We will discuss more about the oracle problem in section 6. We can also introduce data in the use cases by marking the GUI component and add a table with all possible data. For example, the text box in Step 2 can have an associated table with many different names (valid and invalid data). If no data is associated with the GUI component a random data is generated.

5 Design of Experiments
As we can observe, a exible framework enables to perform a lot of experiments. However, to validate the result of these experiments we argue that statistical methods may be applied. In this section we describe some guidelines to create a controlled experiment and briey present an example of the analysis of this kind of evaluation. Basically, in the design of experiments [21] we have 3 activities: planning, evaluation and analysis. Figure 6 shows a guideline that we use in our controlled experiment. This guideline is composed by the following tasks: Dene the main goal and hypothesis: it is important to state precisely what is the goal of the experiment and the hypothesis the experiment must conrm or reject. GQM: to reach the goal of the experiment we dene questions and metrics for the main goal based on the goal-question-metric (GQM) approach [23]

255

Verbs Start Fill Select Click

Nouns message screen url your name your email address your message

Component URL, Application, . . . TextBox, . . . ComboBox, RadioButton, . . . Button, link, . . .

Item Code URL.set() FormValue.name().set() FormValue.name().set() ClickItem.click()

Table 1. GUI Mapping example.

Experimental material: we characterize the subjects of the experiment by dening the material used in the experiment. That is, the objects (hardware, application, etc.) which the treatments will be applied. Treatment and experiment design: we also need to dene factors (variables of interest that we want to observe) and the structure of the experiment (factorial, blocking, replication, etc.). For example, in our controlled experiment the structure of the experiment is a 24 full factorial design (2 levels with 4 factors). This means 16 treatments in a randomized complete block design (RCBD) without replication. Execute the experiment and collect the data: after the experiment planning we have to execute the experiment. This execution can be manual if the experiment depends on human interaction or automated if the experiment uses automated techniques. Analysis of variance: it observes the variance of two (or more) observed variables and it is useful to comparison purposes. In our study we use this to compare the GUI testing techniques being analysed. Effect analysis of variables: it observes the effect of the variables used in the study. For example, our GUI techniques have some variables that determine the size of the test cases and the probability associated with events. Our experimental studies [8, 9] investigate the inuence of certain factors on the efciency and efcacy of our testing techniques. For this reason, we believe that this proposed framework will be very useful.

Figure 6. Guideline in Experiment Design.

5.1

Example

these techniques in other contexts such as web and desktop applications. Table 2 presents the collected data for our controlled experiments. The column Pattern is the treatment identication and the other columns under the column Treatments were already introduced. The columns under the column Blocks show the results (time to nd a crash) for each phone conguration. Also, we show the mean time of each block (columns) and each treatment (lines). It is worth pointing out that these means consider only the executions that nd a crash (the timeouts were not considered). To test our null hypothesis we develop an initial ANOVA table (see Table 3). As the p-value of this table (the probability under the column Prob>F) is less than 0.05 (we assume the default signicance value), we cannot reject the null hypothesis and then we must assume the alternative hypothesis. That is, we need to further investigate which factors can cause some effect in the time to nd a crash.
Source Model Error C. Total DF 13 50 63 Sum Sq 8799.15328 4121.00656 12920.1598 Mean Sq 676.85794 82.42013 . F value 8.21228 Prob > F 1.72909e-8

We performed a planned and controlled experimentation to complement the information we have learned from the previous work [9]. We choose to do a deeper analysis on the GUI testing techniques with the highest level of automation and determine which technique is better. However we only evaluate GUI techniques in cellular phone applications context. It means that a more exible framework we be very useful to introduce new GUI testing techniques and apply

Table 3. ANOVA table. Also, it is important to observe the effect of each fac-

256

Pattern 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

KeyProb with without with with without without with without with with with without with without without without

Treatments Driven SizeTC initial 50 initial 50 initial 500 random 50 random 500 random 50 random 500 initial 500 Mean random 50 initial 500 initial 50 initial 50 random 500 random 500 initial 500 random 50 Mean

Technique BxT BxT BxT BxT BxT BxT BxT BxT DH DH DH DH DH DH DH DH

TimeCrashB 40.0 36.9 40.0 20.1 12.8 15.1 23.5 27.1 22.6 38.9 40.0 40.0 40.0 40.0 40.0 40.0 30.7 34.8

Blocks TimeCrashC TimeCrashD 40.0 1.4 40.0 1.3 33.4 1.1 16.6 6.8 32.0 10.2 21.9 9.2 28.7 23.2 38.9 1.9 27.9 6.9 22.1 3.2 40.0 8.5 40.0 5.6 40.0 7.7 34.5 14.1 40.0 10.9 40.0 2.1 25.3 7.1 29.3 7.4

TimeCrashH 1.2 2.8 9.1 27.5 29.9 15.1 38.9 1.8 15.8 21.8 40.0 28.2 40.0 33.7 40.0 20.2 10.9 23.0

Mean 1.3 13.7 14.5 17.8 21.2 15.3 28.6 17.4 23.0 8.5 16.9 7.7 27.4 10.9 11.2 18.5 -

Table 2. Full Factorial Design and Results.

tor. Table 4 shows the effect of the factors in our response variable (time to nd a crash) by observing the following multiple (indexed) null and alternative hypotheses: HX : Factor X has no effect in our response vari0 able, where X is each one of our parameters: KeyProb, Driven and SizeTC, the Technique, as well as some of its combinations. HX : Factor X has effect in our response variable (This a X is the same of the null hypothesis).
Source KeyProb Driven SizeTC Technique KeyProb*Driven KeyProb*SizeTC KeyProb*Technique Driven*SizeTC Driven*Technique SizeTC*Technique KeyProb*Driven*SizeTC KeyProb*Driven*Technique KeyProb*SizeTC*Technique Driven*SizeTC*Technique KeyProb*Driven* SizeTC*Technique DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Sum Sq 80.775 45.731 349.222 614.420 23.643 41.763 2.213 452.094 17.956 2.681 14.535 19.031 3.195 68.682 78.101 F Ratio 0.923 0.522 3.991 7.022 0.270 0.477 0.025 5.167 0.205 0.031 0.167 0.217 0.036 0.785 0.892 Prob > F 0.341 0.473 0.052 0.011 0.605 0.493 0.874 0.028 0.653 0.862 0.685 0.643 0.849 0.380 0.349

Driven, KeyProb and the combinations KeyProb*Driven, KeyProb*SizeTC, KeyProb*Technique, Driven*Technique and SizeTC*Technique. However, some factors are identied as having some inuence on our response variable. The factors that have statistical signicance are SizeTC, Technique and the combination SizeTC*Driven. As we can see in Table 4, similarly to the conclusions of our previous work [8], we show that BxT is better than DH and the SizeTC and the Technique parameters and the combination Driven*SizeTC have signicant effects on the time to nd a bug. The complete results can be found in [8].

6 Discussion and Future Work


Our framework is a general solution for GUI design and test case generation and can be extended for any kind of application. The models are easily written in a subset of English. This frees the user to know any specication formalism. The TaRGeT tool provides a plug-in support for new algorithms, that is, new test case generation algorithms can be implemented. Also, the mapping of use cases into the script language can be mechanized by a comparison of the script language API and the CNL syntax. The main advantages of our framework are: an easy way to describe screens and the generation of automated test scripts. Also, we can extend the mappings in order to support different script languages. The direct advantage is to support different kinds of applications, such as web, desktop, mobile, etc. However, for each kind of application different algorithms may be evaluated and the framework seems to be a good way to investigate new techniques of test case generation. One weakness of our framework is about the oracle. Currently, the user must implement it or use a provided ora-

Table 4. ANOVA Table for all effects. From Table 4 we can observe that for almost all factors and its interactions (indicated by an * between factors), the probability of rejecting the null-hypothesis is accepted (assuming the p-value is greater than 5%). Thus, we can conclude that in such cases, these factors as well as its interactions have no effect in our response variable. That is, we do not need to worry about the factors

257

cle associated with the execution of the scripts. However, in the use case we have a column called System Response that means an expected result of each step. In this way, we intend to investigate how to automate the generation of the oracles which are related with GUI requirements (functional, accessibility, usability, etc). In our work [9] we use a domain-based oracle to detect crashes, which is much more simple to implement and can be included in our framework. As a future work, we will investigate the generation of oracles and generic oracles to detect crashes. Different algorithms to test case generation impact on the time to nd bugs and the number of bugs found. Other possible future work is to investigate the selection of test cases and how to reduce test suites without losing the efciency of bug detection. The TaRGeT tool has some selection algorithms implemented [13] and we can use them in order to reduce the test suites. To improve the techniques proposed in [9] we can use models to improve the exploration in order to detect crashes through a guided exploration. With models we can check sophisticated requirements that are not possible only with the system. Also as future work, we expect to integrate the approaches to perform GUI testing, covering different aspects of the GUIs and providing an extensible framework to evaluate new GUI testing techniques. Another future work we consider is how to automate the extraction of use cases from the systems and then generate and execute these tests in the system. In [6] it was implemented a tool to extract behaviors from mobile phone applications and the resulting models were used to compare with original requirement models to do conformance testing. However, we believe that a more efcient approach is to use the model extraction to automated tests that improve reliability and to detect crashes in the applications.

Automated test scripts generation is used with success but it has many challenges yet. In [5] the authors show an evaluation of large test suites generated automatically. The main problem is that the models are written in a formalism based on Markov Chains and it is difcult to construct models for large applications or even compose these models due to the limitations of the used formalism. In our work, we introduce a CNL language that represents a subset of English and thus it is simpler to maintain and create large models. Stefen et al. [22] propose the JRapture, a capture-andreplay tool that operates on Java byte codes. The authors argument that JRapture can capture much more graphical elements (graphical widgets) than commercial tools. Several tools to capture-replay provide APIs to test script [1,2]. In this sense, we expect to generate scripts and execute them with this kind of tool. Paiva et al. [20] propose a GUI test case generation tool based on specications wrote in spec#. The main difference from our framework is that we write specications in CNL and our proposed framework is exible enough to be extended to many other script languages and applied to different kinds of applications. Acknowledgments. This work was partially supported by the National Institute of Science and Technology for Software Engineering (INES2 ), funded by CNPq and FACEPE, grants 573964/2008-4 and APQ-1037-1.03/08. Also, the rst author was partially supported by the CNPq fellowship 142905/2006-2.

References
[1] Bad Boy: Capture Replay Tool. http://www.badboy.com/. [2] GUITAR: a GUI Testing frAmewoRk. http://guitar.sourceforge.net/. [3] L. Apfelbaum and J. Doyle. Model Based Testing. In Software Quality Week Conference, pages 296300, 1997. [4] B. Beizer. Software Testing Techniques. International Thomson Computer Press, 1990. [5] C. Bertolini, A. G. Farina, P. Fernandes, and F. M. Oliveira. Test case generation using stochastic automata networks: Quantitative analysis. In Second IEEE International Conference on Software Engineering and Formal Methods, pages 251260, Beijing, China, 2004. IEEE Computer Society. [6] C. Bertolini and A. Mota. Using Renement Checking as System Testing. In 11th Iberoamerican Workshop on Requirements Engineering and Software Environments (IDEAS 2008), pages 1730, Recife, Brazil, February 2008. [7] C. Bertolini and A. Mota. Using Probabilistic Model Checking to Evaluate GUI Testing Techniques. In SEFM 2009. IEEE, Nov. 2009. [8] C. Bertolini, A. Mota, E. Aranha, and C. Ferraz. GUI Testing Techniques Evaluation by Designed Experiments. In ICST 2010. IEEE, Apr. 2010.
2 http://www.ines.org.br

7 Related Work
Weyuker et al. [12, 2628] has been working with comparison of testing techniques using different approaches. The work reported in [27] shows two ways to compare testing techniques: a formal and an empirical. Its main message is that people from industry think it is difcult to model systems and a large case study is more effective to compare testing techniques. Our work assumes an empirical evaluation based on statistical techniques and we also intend to provide an easy framework to test case generation. Several different approaches have been proposed to maintain test cases updated. In [16] a general solution for test case maintenance is proposed which is based on some heuristics. In our work, we do not use heuristics but a design of use case with a model-based testing approach. However, as future work heuristics could be added in our framework to improve the maintainability of the use cases.

258

[9] C. Bertolini, G. Peres, M. dAmorim, and A. Mota. An Empirical Evaluation of Automated Black Box Testing Techniques for Crashing GUIs. In ICST 2009, pages 2130. IEEE, Apr. 2009. [10] P. Borba, D. Torres, R. Marques, and L. Wetzel. TaRGeT: Test and Requirements Generation Tool. In Motorolas Innovation Conference (IC2007), Software Expo Session, Lombard, Illinois, USA, 2007. [11] G. Cabral and A. Sampaio. Formal specication generation from requirement documents. Electron. Notes Theor. Comput. Sci., 195:171188, 2008. [12] P. G. Frankl and E. J. Weyuker. An Analytical Comparison of the Fault-Detecting Ability of Data Flow Testing Techniques. In ICSE, pages 415424, 1993. [13] E. Gadelha, P. Machado, and F. Neto. On the Use of a Similarity Function for Test Case Selection in the Context of Model-Based Testing. Software Testing, Verication and Reliability Journal, 2009. [14] A. Kervinen, M. Maunumaa, T. P akk nen, and M. Katara. a o Model-Based Testing Through a GUI. In FATES, pages 16 31, 2005. [15] D. Lee and M. Yannakakis. Principles and Methods of Testing Finite State Machines - A Survey. In Proceeding of The IEEE, volume 84, pages 10901123, Aug. 1996. [16] S. McMaster and A. M. Memon. An extensible heuristicbased framework for gui test case maintenance. Software Testing Verication and Validation Workshop, IEEE International Conference on, 0:251254, 2009. [17] A. M. Memon. A Comprehensive Framework for Testing Graphical User Interfaces. Ph.D., University of Pittsburgh, Pittsburgh, PA, July 2001. [18] G. J. Myers. Art of Software Testing. John Wiley & Sons, Inc., 1979. [19] S. Nogueira, E. G. Cartaxo, D. T. Torres, E. H. S. Aranha, and R. Marques. Model Based Test Generation: An Industrial Experience. In 1st Brazilian Workshop on Systematic and Automated Software Testing, Joao Pessoa, Brazil, 2007. [20] A. Paiva, J. C. P. Faria, N. Tillmann, and R. F. A. M. Vidal. A model-to-implementation mapping tool for automated model-based gui testing. In ICFEM, pages 450464, 2005. [21] D. I. Sjoberg, J. E. Hannay, O. Hansen, V. B. Kampenes, A. Karahasanovic, N.-K. Liborg, and A. C. Rekdal. A Survey of Controlled Experiments in Software Engineering. IEEE Transactions on Software Engineering, 31(9):733 753, 2005. [22] J. Steven, P. Chandra, B. Fleck, and A. Podgurski. jrapture: A capture/replay tool for observation-based testing. SIGSOFT Softw. Eng. Notes, 25(5):158167, 2000. [23] R. van Solingen and E. Berghout. Integrating Goal-Oriented Measurement in Industrial Software Engineering: Industrial Experiences with and Additions to the Goal/Question/Metric Method (GQM). In IEEE METRICS, pages 246258, 2001. [24] M. Vieira, J. Leduc, B. Hasling, R. Subramanyan, and J. Kazmeier. Automation of GUI Testing Using a ModelDriven Approach. In AST, pages 914, New York, NY, USA, 2006. ACM.

[25] W. Visser, C. S. Pasareanu, and R. Pelanek. Test Input Generation for Java Containers Using State Matching. In ISSTA, pages 3748, 2006. [26] E. J. Weyuker. Evaluation Techniques for Improving the Quality of Very Large Software Systems in a Cost-Effective Way. Journal of Systems and Software, 47(2-3):97103, 1999. [27] E. J. Weyuker. Comparing the Effectiveness of Testing Techniques. In Formal Methods and Testing, pages 271291. Lecture Notes in Computer Science, 2008. [28] E. J. Weyuker, S. N. Weiss, and R. G. Hamlet. Comparison of Program Testing Strategies. In Symposium on Testing, Analysis, and Verication, pages 110, 1991. [29] Q. Xie. Developing Cost-Effective Model-Based Techniques for GUI testing. In ICSE, pages 9971000, New York, NY, USA, 2006. ACM. [30] Q. Xie and A. M. Memon. Rapid Crash Testing for Continuously Evolving GUI-Based Software Applications. In ICSM, pages 473482, 2005.

259

You might also like