Dynamic symbolic execution has gathered a lot of attention in recent years as an effective technique for generating high-coverage test suites and finding deep errors in complex software applications. In this tutorial-style presentation, I will introduce the main concepts of dynamic symbolic execution and exemplify them in the context of our KLEE symbolic execution infrastructure. The talk is primarily targeted to those who have no direct experience with dynamic symbolic execution and KLEE, but the talk will also include several parts useful for a more specialist audience.
Compilers are among the infrastructure on which we most critically depend, yet production compilers are beyond the scope of formal verification. Testing compilers is hindered by the lack of a reliable “test oracle”: for well defined yet complex languages it is possible in principle to have a golden reference compiler against which other compilers could be compared, but such a reference compiler rarely exists in practice. Furthermore, many programming languages are under-specified, permitting a high degree of implementation flexibility meaning that a reference compiler to be used as a test oracle cannot exist even in theory.
We have been investigating methods for testing compilers for the OpenGL shading language, GLSL, a language that exhibits a high degree of under-specification with regard to floating-point operations, making the oracle problem particularly hard. Our approach to testing GLSL compilers is to use “metamorphic testing”, whereby a compiler is cross-checked against itself using families of semantically equivalent graphics shader programs. For each shader in a family, an OpenGL implementation should render a very similar image, such that outlier images (as judged by an appropriate image differencing metric) highlight compiler bugs.
I will give an overview of this approach, with examples of some of the functional and security-critical bugs that our approach has shed light on, and I will discuss our ongoing efforts to commercialise the technology via GraphicsFuzz, a spinout company from Imperial College London. I will also briefly speculate on the potential for metamorphic testing to aid in analysing other systems that are fundamentally hard to test, arising in machine learning and computer vision, for example.
Christophe Gaston – Model Based Testing for Timed Distributed Systems: a Symbolic Framework for the Oracle Problem
Many systems interact with their environments at physically distributed interfaces and the distributed nature of any observations made is known to complicate testing. This talk concerns distributed testing, where a separate tester is placed at each localized interface and may only observe what happens at this interface. The focus is on the oracle problem, which consists in analyzing observations of executions of a system with respect to its model in order to evaluate if the system conforms to the model. In particular, we focus on how reasoning about time, both at the level of SUT executions and at the level of reference models may help to solve the oracle problem. A symbolic execution based framework will be presented.
Many systems interact with their environments at physically distributed interfaces and the distributed nature of any observations made is known to complicate testing. This talk concerns distributed testing, where a separate tester is placed at each localized interface and may only observe what happens at this interface. The focus is on the oracle problem, which consists in analyzing observations of executions of a system with respect to its model in order to evaluate if the system conforms to the model. Another particular focus is made on real-time models, in order to show how using information about time may help in solving the oracle problem. A symbolic execution based framework will be presented.
Security testing is a pivotal activity in engineering secure software. It consists of two phases: generating attack inputs to test the system, and assessing whether test executions expose any vulnerabilities. This lecture aims at providing the foundations behind security testing, with a particular attention to web applications. The lecture covers the attack models of prominent vulnerabilities. Foundational automated techniques will be presented for black-box and white-box security testing.
In this tutorial I will look at a variety of techniques that can be brought to bear directly on security of systems. I particular I will look at the role generalised fault injection approaches can play in security analysis but also seek to indicate why (dynamic) security testing can be tricky. We’ll start with what might be described as “the Flaw Hypothesis” (from early evaluation criteria thinking), move through requirements elicitation, and end up asking whether dynamic testing really has much to offer for gaining confidence in certain security properties. I’ll end by looking at possible roles for the inclusion of machine learning, e.g. for the development of hyper-efficient DoS strategies and flaw discovery.
Safety and security are two very closely related non-functional properties of utmost importance for highly dependable systems. Both of them are emerging properties in the sense that they cannot be verified by analysing single components. Instead, the complete system, its environment, and the modes of interaction between their constituents need to be considered for assessing safety or security properties.
There are, however, crucial differences to be considered which will be highlighted in this presentation from the testing perspective. We start by reviewing the methodology available for testing safety properties with guaranteed fault coverage. Then it is explained which parts can be adopted for security-driven testing in a straight-forward way, while critical differences lead to new testing paradigms for the security domain.
What is a good test case? The textbook definition as one that detects a defect is sensible from a management perspective only – but otherwise not very useful, as the thought experiment of test cases for a perfectly correct system illustrates. I will argue why coverage-based tests in general can’t be “good” and why only tests based on defect hypotheses can be expected to be “good”. I will present a framework for defect hypotheses and their operationalization that I will illustrate using several examples from the domains of security, legacy business IT systems, and continuous controllers.
We take a brief tour of applications of information theory in software testing. Our aim is to appreciate what is known and understood already, look at some existing applications in more detail, then survey some Rumsfeldian” known unknowns”.
The diversity of outputs produced by executing test cases may provide a useful surrogate when whitebox techniques are inapplicable and an effective complement where they are. Our investigations found that output diversity exhibits average correlation coefficients of 0.85, 0.83 and 0.97 with statement, branch and path coverage respectively. More interestingly, output diversity finds 92% of the real faults found by branch coverage (and a further 47% that remained undetected by such whitebox techniques).
Software testing is one of the most challenging research field of Software Engineering. Traditional strategies, such as coverage, have predominated for the last decades, however, these techniques hardly overcome random approaches. Researchers are looking for new methodologies to improve the quality on automatic testing, specially focused on test prioritisation and test suite generators. During these years, the community started to hypothesise that every test has the same probability of triggering a bug. This hypothesis is the base for diversification. Testing by diversity aims to extend the testing strategies by exposing rare and different paths that can be hardly activated using heuristics. In this talk we will explore the meaning of diversity, having different perspective from information theory, and we will discuss different applications to automatic test suite generators and testing prioritisation, based on input, behaviour and output diversity.
Search-based Software Engineering:
Paolo Tonella – Automated generation of unit tests for object-oriented code using genetic algorithms
In this tutorial, I will first provide an overall introduction to the usage of genetic algorithms for the generation of unit test cases under the assumption that the unit under test is a class, hence requiring its stateful instances to be created, manipulated and tested. I will describe an example of chromosome commonly adopted to represent an object oriented test case and I will present the genetic operators that can be used to manipulate such chromosome. I will also provide a definition of the fitness function commonly used in this context.
In the second part of the tutorial, I will focus on the alternative problem formulations that try to address the problem of infeasible and difficult test targets. In particular, I will describe in depth a novel many objective sorting algorithm that addresses such problem by means of many-objective optimization based on a custom ranking function. I will conclude with a comparison among the most recent proposals for search based testing of object oriented programs.
I will explain what Genetic Improvement (GI) is and how GI uses evolutionary computing and search based optimisation, particularly genetic programming (GP), on existing programs. I will give a brief summary of existing applications, current research and future applications.
Existing GI research can be divided into improving functional and non-functional properties of existing code. Initially GI tended to concentrate on automatic bug repair in which functionality is improved by removing errors. More recently we looked at non-functional properties such as execution time, memory foot print and energy consumption, but work on functional properties now includes automatic enhancement by transplanting open source code, improved performance, e.g. better predictions, and automatic porting to new computer hardware.
Society has become addicted to software, yet the production and maintenance of software remains highly labour intensive, resembling a hand craft, almost a cottage industry, rather than a smooth product line. Although software engineering tools continue to assist programmers they remain entirely human and automatic programming continue to be a long way off. Instead an intermediate goal of genetic improvement is not to assist people with their current programming tasks but to lift the level of abstraction by helping them to say what needs to be done and then Genetic Improvement offering a range of potential implementations, each Pareto optimal with respect to different user priorities. Indeed GI could already be part of personalised per user bag-of-parts software automatically configured by evolution.
Open source program repositories are widely used in software engineering empirical investigations. There are now literally billions of lines of human written code available for analysis. Some analysis already uses advanced machine learning techniques. There is scope not only for learning how people write code but also how they test software. Although great progress has been made in automatic test case generation, knowing the answers the software under test should give (the oracle problem) remains a major stumbling block. As some code repositories also contain open source test suites,potentially machine learning might automatically infer test oracles.
Although artificial evolution has been widely used in non-conventional systems I will concentrate on the software controlling digital computer based emergent and self-adaptive systems.