Provenance and Pseudo-Provenance for Seeded Learning-Based Automated Test Generation
نویسندگان
چکیده
Many methods for automated software test generation, including some that explicitly use machine learning (and some that use ML more broadly conceived) derive new tests from existing tests (often referred to as seeds). Often, the seed tests from which new tests are derived are manually constructed, or at least simpler than the tests that are produced as the final outputs of such test generators. We propose annotation of generated tests with a provenance (trail) showing how individual generated tests of interest (especially failing tests) derive from seed tests, and how the population of generated tests relates to the original seed tests. In some cases, post-processing of generated tests can invalidate provenance information, in which case we also propose a method for attempting to construct “pseudo-provenance” describing how the tests could have been (partly) generated from seeds. 1 Seeded Automated Test Generation Automatic generation of software tests, including (security) fuzzing [33, 8], random testing [31, 14, 26], search-based/evolutionary testing [5], and symbolic or concolic execution [7, 6, 2, 27, 22, 35] is essential for improving software security and reliability. Many of these techniques rely on some form of learning, sometimes directly using standard algorithms [20, 28, 3, 15] such as reinforcement learning [12, 11, 30], and sometimes in a more broadly conceived way. In fact, using Mitchell’s classic definition of machine learning as concerning any computer program that improves its performance at some task through experience [24], almost all non-trivial automated test generation algorithms are machine-learning systems, with the following approximate description: 1. Based on results of running all past tests (T ), produce a new test t = f(T ) to execute. 2. Execute t and collect data d on code coverage, fault detection and other information of interest for the execution of t. 3. T = update(T, t, d) 4. Go to step 1. Performance here (in Mitchell’s sense) is usually measured by collective code coverage or fault detection of tests in T , or may be defined over only a subset of the tests (those deemed most useful, output as a test suite). The function f varies widely: f may represent random testing with probabilities of actions determined by past executions [1], a genetic-algorithms approach where tests in T are mutated and/or combined with each other, based on their individual performances [23, 5, 33], or an approach using symbolic execution to discover new tests satisfying certain constraints on the execution path [7, 6, 2]. A similar framework uses reinforcement learning, but constructs each test on-the-fly and performs update calls after every step of testing [12]. A common feature however, is Interpretable ML Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. that many methods do not begin with the “blank slate” of an empty T . Instead, they take as an initial input a population of tests that are thought to be high-quality (and, most importantly, to provide some guidance as to the structure of valid tests), and proceed to generate new tests from these seed tests [33, 27, 22, 35, 29]. Seed tests are usually manually generated, or tests selected from a previously generated suite for their high coverage or fault detection [32, 10]. It is generally the case that seed tests are more easily understood by users than newly generated tests. For example, seed tests often include only valid inputs and “reasonable” sequences of test actions, while generated tests, to improve coverage and fault detection, often include invalid inputs or bizarre method call sequences. For example, consider the extremely popular and successful American Fuzzy Lop (AFL) tool for security fuzzing [33]. It usually begins fuzzing (trying to generate inputs that cause crashes indicating potential security vulnerabilities) from a corpus of “good” inputs to a program, e.g., actual audio or graphics files. When a corpus input is mutated and the result is “interesting,” by a code-coverage based heuristic, the new input is added to the corpus of tests to use in creating future tests. Many tools, considered at a high level, operate in the same fashion, with the critical differences arising from engineering aspects (how tests are executed and instrumented), varied heuristics for selecting tests to mutate, and choice of mutation methods. AFL records the origin of each test in its queue in test filenames, which suffices in AFL’s case because each test produced is the result of a change to a single, pre-existing test, in most cases, or the merger of two tests, in rarer cases. This kind of trace back to the source of a generated test in some seed test (possibly through a long trail of also-generated tests) is essentially a provenance, which we argue is the most easily understood explanation of a learning result for humans, in those cases (such as testing) where the algorithm’s purpose is to produce novel, interesting objects from existing objects. This simple approach used in AFL works for cases where the provenance of a test is always mediated by mutation, making for a clear, simple “audit trail.” However, a more complex or fine-grained approach is required when the influence of seeds is probabilistic, or a test is composed of (partial) fragments of many tests. Moreover, AFL provides no tools to guide users in making use of what amounts to an internal book-keepingmechanism, and does not produce provenance output designed for human examination. Finally, tests, once generated, are frequently manipulated in ways that may make provenance information no longer valid: a test produced from two seed tests (or seed testderived tests) may be reduced [34] so that one of the tests no longer is present at all, for example. In this paper, we propose to go beyond the kind of simple mechanisms found in AFL, and offer the following contributions: • We present an implementation of provenance for an algorithm that involves generating new tests from partial sequences from many seed tests. • We discuss ways to present information about not just the provenance of a single test, but the impact on future tests of initial seed tests. While single-test provenance is useful for developers debugging, information on general impact of seeds is more important for design and analysis of test generation configuration and algorithms. • We identify test manipulations that partially or completely destroy/invalidate provenance information, and propose an algorithm for producing a pseudo-provenance, showing how the tests generated could have been generated from seeds, even if they were not actually thus generated, and discuss abstractions that enable pseudo-provenances. 2 A Simple Seeded Generation Algorithm with Provenance We implemented a novel test generation technique for the TSTL [19, 16, 17, 18] test generation language and tool for Python. In this approach, the seed tests are split into (usually short) subsequences of length k. In place of the usual algorithm for random testing, where a new test action is randomly chosen at each step during testing, our approach always attempts to follow some subsequence, in a best-effort fashion (if the next step in the current sub-sequence is not enabled, it is skipped). When a test generated in this fashion covers new code (the usual metric for deciding when to learn from a test, in such methods), it too is broken into sub-sequences and the sequences are added to the sub-sequence pool and used in generation of future tests. In TSTL, a test is a sequence of components (test actions), and the provenance of a test generated using this algorithm involves numerous tests, and varying parts of those tests. We extended TSTL
منابع مشابه
Towards next Generation Provenance Systems for E-science towards next Generation Provenance Systems for E-science
e-Science helps scientists to automate scientific discovery processes and experiments, and promote collaboration across organizational boundaries and disciplines. These experiments involve data discovery, knowledge discovery, integration, linking, and analysis through different software tools and activities. Scientific workflow is one technique through which such activities and processes can be...
متن کاملMachine Learning Techniques for Establishing the Provenance of Biological Interactions
A substantial amount of knowledge about biomolecular interactions resides in the biomedical literature and has been increasingly made available as structured, curated database entries. Although recent research in biological assertion extraction has mostly focused on automatically detecting interacting entities, automated generation of provenance for interacting pairs remains of equal importance...
متن کاملLinked provenance data: A semantic Web-based approach to interoperable workflow traces
The Third Provenance Challenge (PC3) offered an opportunity for provenance researchers to evaluate the interoperability of leading provenance models with special emphasis on importing and querying workflow traces generated by others. We investigated interoperability issues related to reusing Open Provenance Model (OPM)-based workflow traces. We compiled data about interoperability issues that w...
متن کاملAutomated Provenance Collection for CCA Component Assemblies
The problem of capturing provenance for computational tasks has recently received significant attention, due to the new set of beneficial uses (for optimization, debugging, etc.) of the recorded data. We develop a provenance collection system aimed at scientific applications that are based on the Common Component Architecture (CCA) that alleviates scientists from the responsibility to manually ...
متن کاملQualityTrails: Data Quality Provenance as a Basis for Sensemaking
Visual Analytics prototypes increasingly support human sensemaking through providing Provenance information. For data analysts the challenge of knowledge generation starts with assessing the quality of a data set, but Provenance is not yet utilized to aid this task. This position paper aims at characterizing the complexity of Visual Analytics methods introducing Provenance in Data Quality by hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.01661 شماره
صفحات -
تاریخ انتشار 2017