Josie Holmes, Iftekhar Ahmed, Caius Brindescu, Rahul Gopinath, He Zhang, and Alex Groce. "Using Relative Lines of Code to Guide Automated Test Generation for Python." ACM Transactions on Software Engineering and Methodology, 29(4): 28-1-28:38, 2020.
This paper shows that simply estimating the lines of code in each function (including, only once, each function called by a function as well), and adjusting probabilities in random test generation according to relative LOC, such that if f has 2x the LOC of g, f also will be called twice as often, can (sometimes dramatically) improve the effectiveness of random testing for Python programs. We didn't scientifically prove it, but I would personally speculate that this is likely true for any API-call-based test generation, not just Python. In Python the need is particularly acute, due to the high cost of code coverage instrumentation, but in our results sometimes LOC was better than a coverage-driven GA, even paying that overhead, and both together were usually the best approach. Side note: when useful, unsurprisingly, swarm testing was extremely effective, but it often was not useful.
Josie Holmes and Alex Groce. "Using Mutants to Help Developers Distinguish and Debug (Compiler) Faults." Journal of Software Testing, Verification, and Reliability, 30(2):e1727, January 2020.
Extension of our ISSRE 18 paper; much more readable (that is a dense paper, and this allows us to add an extended example), plus "one weird trick" that makes it much less expensive and makes it perform better, plus a comparison with Semantic Crash Bucketing, the best competitor, and a glimpse of a future where SCB and this possibly merge.
Alex Groce, Iftekhar Ahmed, Carlos Jensen, Paul E. McKenney, and Josie Holmes. "How Verified (or Tested) is My Code? Falsification-Driven Verification and Testing." Automated Software Engineering Journal, 25(4):917-960, December 2018.
Extension of our ASE 15 paper on using mutation analysis to improve verification efforts; extends the same concepts to improving automated testing efforts, and adds some new insights into the general approach, and how it connects to Popper's ideas on scientific method.
Josie Holmes, Alex Groce, Jervis Pinto, Pranjal Mittal, Pooria Azimi, Kevin Kellar, and James O'Brien. "TSTL: the Template Scripting Testing Language." International Journal on Software Tools for Technology Transfer, 20(1):57-78, February 2018.
This paper is a solid overview of the TSTL domain-specific language (and toolset) for software testing. TSTL uses a nondeterministic choice-of-action with guards model, which is useful and easy to use; perhaps more important, TSTL offers the idea of using the DSL to build a general first-class library for test generation/manipulation for any SUT (software under test), hiding the details of the SUT itself.
Rahul Gopinath, Iftekhar Ahmed, Mohammad Amin Alipour, Carlos Jensen, and Alex Groce. "Mutation Reduction Strategies Considered Harmful." IEEE Transactions on Reliability, 66(3): 854-874, September 2017.
Many methods intended to improve mutation testing effectiveness are both limited in theoretical advantage over random sampling, and in practice may cause more harm than good.
Rahul Gopinath, Iftekhar Ahmed, Mohammad Amin Alipour, Carlos Jensen, and Alex Groce. "Does Choice of Mutation Tool Matter?" Software Quality Journal, 25(3):871-920, September 2017.
We suggest that the answer is "no" (in terms of average score) but "yes" in terms of per-project variation in score, and propose some standardization.
Alex Groce, Mohammad Amin Alipour, Chaoqiang Zhang, Yang Chen, and John Regehr. "Cause Reduction: Delta Debugging, Even Without Bugs." Software Testing, Verification and Reliability, Volume 26, Number 1, pages 40-68, January 2016.
The classic Hildebrandt-Zeller delta-debugging can usefully be applied not just to preserve the property "this test fails," but to reduce the size of a test while holding constant any property of interest. The obvious application we demonstrate is improving regression testing by minimizing not just suites, but individual tests. Can also boost seeded symbolic execution very nicely, however, which is less obvious.
Milos Gligoric, Alex Groce, Chaoqiang Zhang, Rohan Sharma, Mohammad Amin Alipour, and Darko Marinov. "Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites." ACM Transactions on Software Engineering and Methodology, Volume 24, Number 4, pages 4-37, August 2015.
A mass of data about how coverage and mutation score relate, over a large number of projects, with some support for branch coverage and intra-procedural acyclic path coverage as particularly effective for comparing test suites.
Alex Groce, Klaus Havelund, Gerard Holzmann, Rajeev Joshi, and Ru-Gang Xu. "Establishing Flight Software Reliability: Testing, Model Checking, Constraint-Solving, Monitoring and Learning." Annals of Mathematics and Artificial Intelligence, Volume 70, Number 4, pages 315-348, April 2014.
Overview of a variety of software verification and testing attempts (on file systems) at JPL, related to the Curiosity Rover flight software and other actual mission code. No major technical advances, but interesting for the "when you actually try it" insights.
Alex Groce, Todd Kulesza, Chaoqiang Zhang, Shalini Shamasunder, Margaret Burnett, Weng-Keen Wong, Simone Stumpf, Shubhomoy Das, Amber Shinsel, Forrest Bice, and Kevin McIntosh. "You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems." IEEE Transactions on Software Engineering, Volume 40, Number 3, pages 307-323, March 2014.
Machine learning systems' bugs are often predictable enough that, even though tests are costly, users can quickly find faults in a system based on relatively straightforward heuristic methods for selecting a few tests.
Gerard Holzmann, Rajeev Joshi, and Alex Groce. "Swarm Verification Techniques." IEEE Transactions on Software Engineering, Volume 37, Number 6, pages 845-857, November 2011.
Diversity in model checker configuration is a good way to find bugs fast (and cover a model better) when complete verification is not practical.
Alex Groce and Rajeev Joshi. "Exploiting Traces in Static Program Analysis: Better Model Checking through printfs." International Journal on Software Tools for Technology Transfer, online first, January 2008.
You can improve the efficiency of SAT-based model checking if you know that the counterexample of interest produced a particular output trace. This approach also allows some novel specification methods using CBMC.
Alex Groce, Doron Peled, and Mihalis Yannakakis. "Adaptive Model Checking." Logic Journal of the IGPL, Volume 14, Number 5, pages 729-744, October 2006.
Finite-state-machine learning algorithms can be used to model check a system based on a partial or inaccurate model (termination will arise from either a real counterexample, or a conformance-checked model of the system, or (if there is no counterexample, and your upper bound on number of states is not fairly small) you running out of patience).
Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman. "Error Explanation with Distance Metrics." International Journal on Software Tools for Technology Transfer, Volume 8, Number 3, pages 229-247, June 2006.
SAT-based model checking with Pseudo-Boolean queries can be used to search for the most similar successful execution to a counterexample, and this can be useful for debugging. Also describes a "slicing" algorithm based on relaxed transition relations that has generally confused and dismayed even seasoned researchers. Can't say that offhand I recall exactly why it makes sense, now.
Alex Groce and Willem Visser. "Heuristics for Model Checking Java Programs." International Journal on Software Tools for Technology Transfer, Volume 6, Number 4, pages 260-276, August 2004.
This is a good overview of not just our work, but basic AI-inspired heuristic search as it relates to (Java) explicit-state model checking.
Sagar Chaki, Edmund Clarke, Alex Groce, Joel Ouaknine, Ofer Strichman, and Karen Yorav. "Efficient Verification of Sequential and Concurrent C Programs." Formal Methods in System Design, Special Issue on Software Model Checking, Volume 25, Numbers 2-3, pages 129-166, September-November 2004.
An omnibus of some important ideas in C verification, ten or more years ago --- alternatively, the MAGIC + concurrency journal paper.
Sagar Chaki, Edmund Clarke, Alex Groce, Somesh Jha, and Helmut Veith. "Modular Verification of Software Components in C." IEEE Transactions on Software Engineering, Volume 30, Number 6, pages 388-402, June 2004.
Fuller version of the ICSE paper presenting the MAGIC approach to model checking C code, a Counterexample-Guided Abstraction Refinement method with particular emphasis on 1) modularity and 2) "two Jeffs" (Magee and Kramer) style FSP specification.
Alex Groce. "(Programs), Proofs and Refutations (and Tests and Mutants)." In ACM Symposium on New Ideas in Programming and Reflections on Software, Onward! Essays, part of SPLASH (ACM SIGPLAN Conference on Systems, Programming, Languages and Applications: Software for Humanity), accepted for publication, Pasadena, California, October 2024.
This is a dialogue, in tribute to Lakatos' Proofs and Refutations; it touches on much of what I consider most fundamental and interesting in the world of program correctness.
Sourav Deb, Kush Jain, Rijnard van Tonder, Claire Le Goues, and Alex Groce. "Syntax Is All You Need: A Universal-Language Approach to Mutant Generation." In ACM International Conference on the Foundations of Software Engineering, accepted for publication, Porto de Galinhas, Brazil, July 2024.
This paper presents a new approach to generation of mutants for mutation testing (and other uses of program mutants); we show that using a rule-based hierarchical approach, we can match the mutation generation power of a large number of tools targeting specific languages, and even handle languages of which our tool has no knowledge, despite our approach requiring nearly an order of magnitude fewer lines of code to implement (and thus making it much much easier to maintain and extend).
Kush Jain, Uri Alon, Alex Groce, and Claire Le Goues. "Contextual Predictive Mutation Testing." In ACM International Conference on the Foundations of Software Engineering, pages 250-261, San Francisco, California, December 2023.
Predictive mutation testing offered the idea that you could estimate whether a particular mutant was killed (and by which tests) using machine learning, thus saving much of the time spent running tests on mutants. This paper matures that idea by bringing more modern machine learning methods and critical contextual information into the PMT paradigm.
Kush Jain, Goutamkumar Tulajappa Kalburgi, Claire Le Goues, and Alex Groce. "Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts." In IEEE International Symposium on Software Reliability Engineering, pages 102-113, Florence, Italy, October 2023.
This paper argues that the difference between structural code coverage and mutation score (e.g., percent lines covered minus percent mutants killed) is a powerful tool for understanding testing efforts. E.g., if a particular file has a high coverage score and low mutation score, it may be important code whose oracle is either highly inadequate or perhaps faulty. There's quite a bit of insight here on the general "structure" of testing efforts, I think, and the paper is worth reading for that aspect as much as for the data on the novel metric itself.
Alex Groce, Kush Jain, Rijnard van Tonder, Goutamkumar Tulajappa Kalburgi, and Claire Le Goues, "Looking for Lacunae in Bitcoin Core’s Fuzzing Efforts." In IEEE/ACM International Conference on Software Engineering, pages 185-186, Pittsburgh, Pennsylvania, May 2022.
This is a short summary of work evaluating (and trying to improve) the fuzzing for Bitcoin Core, partly a paid effort over the summer of 2021, and partly extended to a more academic investigation in the fall of 2021 with colleagues at CMU and Sourcegraph. The real scoop on this work is in the full report prepared for Chaincode/Bitcoin Core, which I think is a very interesting look in particular at the challenge of oracles in fuzzing.
Alex Groce, Rijnard van Tonder, Goutamkumar Tulajappa Kalburgi, and Claire Le Goues, "Making No-Fuss Compiler Fuzzing Effective." In ACM SIGPLAN 2022 International Conference on Compiler Construction, pages 194-204, Seoul, South Korea, April 2022.
This paper reports on a new approach to fuzzing compilers using AFL-like "off the shelf" fuzzers, by adding input mutation strategies based on source-code mutation tools to the fuzzer's toolbox. The tool is available here. To date, we've reported well over 100 serious compiler bugs that were fixed using this tool, and it is easy for developers to apply themselves. Our experiments show a significant (up to more than 2x) improvement over AFL and AFL++ in terms of unique bugs for multiple real production compilers.
Alex Groce, Kush Jain, Goutamkumar Tulajappa Kalburgi, Claire Le Goues, Kush Jain, and Rahul Gopinath, "Registered Report: First, Fuzz the Mutants." In 1st International Fuzzing Workshop (FUZZING), San Diego, California, April 2022.
This proposes a new approach to fuzzing: split the budget for fuzzing between time spent fuzzing mutants of the program you want to fuzz and the program itself. While this sounds a bit crazy, the paper explains how mutants can help get around blocks to fuzzing progress, and offers some preliminary results that are quite promising. The idea can be thought of as a generalization of T-Fuzz, really. We look forward to performing a full evaluation of this approach!
Alex Groce, Iftekhar Ahmed, Josselin Feist, Gustavo Grieco, Jiri Gesi, Mehran Meidani and Qi Hong Chen. "Evaluating and Improving Static Analysis Tools Via Differential Mutation Analysis." In IEEE International Conference on Software Quality, Reliability, and Security, pages 207-218, Hainan Island, China, December 2021.
Comparing static analysis tools is hard, because combining a large benchmark based on real code with human confirmation of ground truth is expensive. This paper proposes that tools can, to some extent, be compared by generating program mutants, and then performing a differential comparison. This also makes it possible to identify weak spots in a tool based on the results from another tool, without human effort to establish ground truth. The key insight is that mutants that are detected are likely to be true positives, given the tool did not flag the un-mutated code as having any problems. While most mutants can't be detected by static analysis at all, those that can be detected by at least one tool are usually of interest to users of the static analysis tools.
Jaeseung Choi, Doyeon Kim, Soomin Kim, Gustavo Grieco, Alex Groce, and Sang Kil Cha. "SMARTIAN: Enhancing Smart Contract Fuzzing with Static and Dynamic Analyses." In IEEE/ACM International Conference on Automated Software Engineering , pages 227-239, Melbourne, Australia, November 2021.
SMARTIAN is an open source fuzzer, based originally on Eclipser, for fuzzing smart contracts. This paper presents the novel fuzzing techniques integerated into SMARTIAN to make it the (to date) most effective smart contract fuzzing tool (we believe) of its kind, and an unusually thorough set of experimental comparisons of different contract analysis tools.
Alex Groce. "Let a Thousand Flowers Bloom: on the Uses of Diversity in Software Testing." In ACM Symposium on New Ideas in Programming and Reflections on Software, Onward! Essays, part of SPLASH (ACM SIGPLAN Conference on Systems, Programming, Languages and Applications: Software for Humanity) , pages 136-144, Chicago, Illinois, October 2021.
Here you get my (perhaps disjointed, and, according to one reviewer, bitter about paper rejections!), musings on the basic problem that we don't have any one-stop solutions in software testing, and we're unlikely to get any. This is a mix of technical summaries of techniques for injecting diversity into testing, in order to find as many bugs as possible, and larger-scope musings about the research evaluaton process, and how to deal with the problem of narrow-but-useful methods, where we can't even say quite WHAT narrow application exists, but are fairly sure there is one! A good snapshot of a lot of my thinking over the years on some of the (to me) most interesting challenges in software testing, if a bit muddled. I think this and my other Onward! essay are both solid looks inside my head when not limited by the usual RQs and answers format of a research paper.
Akhila Chowdary Kolla, Alex Groce, and Toby Dylan Hocking. "Fuzz Testing the Compiled Code in R Packages." In IEEE International Symposium on Software Reliability Engineering, Wuhan, China, October 2021.
This paper shows how DeepState can be used as a basis to easily, and fairly automatically, look for memory safety bugs in R packages based on C++ code. Interestingly, a large number of R packages expose users who provide bad inputs (or maybe good inputs, in some cases) to memory safety problems Valgrind can detect. None of these show up with the automated CRAN scans of manually provided test cases, but fuzzers can produce a large number of cases for many packages we tested.
Alex Groce and Gustavo Grieco. "echidna-parade: A Tool for Diverse Multicore Smart Contract Fuzzing." In ACM SIGSOFT International Symposium on Software Testing and Analysis, Denmark, July 2021.
This paper introduces echidna-parade, a tool for orchestrating a set of echidna fuzzers (see the paper below); the power of the tool lies in its exploitation of various kinds of diversity, including swarm testing as well as test length variation and search strategy diversification (tuning fuzzer parameters randomly). In a sense this is the practical embodiment of the long line of my work on variety (since we don't know the best thing to do!) in testing.
Miroslav Gavrilov, Kyle Dewey, Alex Groce, Davina Zamanzadeh, and Ben Hardekopf. "A Practical, Principled Measure of Fuzzer Appeal: A Preliminary Study." In IEEE International Conference on Software Quality, Reliability, and Security, Macau, China, December 2020.
This paper proposes a method for evaluating the effectiveness of fuzzers (and test suites more generally, perhaps?) relying on the ability of inputs to distinguish versions of the code from each other. If bugs are known and fixed (which is often required in order to use bug-counting to evaluate, anyway), this includes bugs as a special case. The measure is able to predict fuzzer effectiveness in cases where coverage is either not available or misleading. The study is preliminary, but given the problems (enumerated here) with other measures, deserves careful consideration as a candidate method for fuzzer evaluation.
Alex Groce and Josie Holmes. "Practical Automatic Lightweight Nondeterminism and Flaky Test Detection and Debugging for Python." In IEEE International Conference on Software Quality, Reliability, and Security, Macau, China, December 2020.
Nondeterminism, caused by a multifarious set of phenomena, plagues software testing. Flaky tests are one instance. This paper uses TSTL and some new formal notions of nondeterminism to find (and remove) causes of nondeterminism in Python libraries, or the tests thereof. The overhead is acceptable, and the method should be applicable to other languages (e.g., in Randoop-like tools for Java). Perhaps the more important story here is the proposal of a novel way to handle probabilistic properties in delta-debugging/test reduction. Such properties often produce pathological behavior in reduction, but using a method based on the concept of replication to avoid publication bias in scientific literature leads to an efficient and effective way to reduce tests where achieving P(effect) >= p is the underlying goal.
Gustavo Grieco, Will Song, Artur Cygan, Josselin Feist, and Alex Groce. "Echidna: Effective, Usable, and Fast Fuzzing for Smart Contracts." In ACM SIGSOFT International Symposium on Software Testing and Analysis, Los Angeles, California, July 2020.
Basic tool paper on Echidna, a fuzzer for Ethereum smart contracts that is, like the title says, intended to be effective, usable, and fast: not too hard to configure, not too hard to run, and with quick feedback exposing bugs.
Alex Groce, Gustavo Grieco, Josselin Feist, and Michael Colburn. "What are the Actual Flaws in Important Smart Contracts (and How Can We Find Them)?." In International Conference on Financial Cryptography and Data Security, Kota Kinabalu, Sabah, Malaysia, February 2020.
Analyses of "everything on the blockchain" (which is mostly garbage) or of "a few famous exploits" (not too useful a sample, or predictive of the future) are common, but this paper presents results from 23 smart-contract audits by Trail of Bits, a leading company in the field, covering a large number of individual contracts and containing 246 individual findings. The upshot is 1) that smart contract flaws are not mostly about reentrancy, but are more similar to those found in general security audits, 2) that automated tools might find up to 75% of the most critical problems, 3) manual analysis is still essential, and tools to aid it will be very important, and 4) unit tests probably don't help much with these kinds of security flaws. There is a lot of detail on finding types here, and the raw data is available for analysis.
Mark Mossberg, Felipe Manzano, Eric Hennenfent, Alex Groce, Gustavo Grieco, Josselin Feist, Trent Brunson, and Artem Dinaburg. "Manticore: A User-Friendly Symbolic Execution Framework for Binaries and Smart Contracts." In IEEE/ACM International Conference on Automated Software Engineering, San Diego, California, November 2019.
Presents Manticore, a powerful and easy-to-script/extend symboliic execution framework that is built to support multiple target architectures (native code or EVM for now, possibly WASM and others in the future).
Arpit Christi, Alex Groce, and Austin Wellman. "Building Resource Adaptations via Test-Based Software Minimization: Application, Challenges, and Opportunities." In IEEE International Symposium on Software Reliability Engineering, Berlin, Germany, October 2019 (industry track).
(A quote from the paper is probably best here): We suspect that developers do not put as much effort as TBSM would ideally expect into testing because the payoff of testing is not always clear. The relationship between test quality, even measured by mutation testing rather than coarse-grained coverage only, and detection and anticipation of important defects worth fixing, is not always extremely strong. However, there is clearly some relationship between test quality and eventual system reliability, and one way to see TBSM (and future techniques of the same kind) is as a new way to get better tests. Because most code does not have a critical bug, effort to write tests is currently seen as a burden, a "cost-center," not a "revenue-center" for developers, to adapt a business analogy. It may have to be done, but testing is purely to ward off disaster. However, TBSM offers a different way to look at tests: high quality tests allow developers to be more productive, in that they enable the automation of certain kinds of changes to the software. Writing high-quality tests may become a much more pleasant part of software development if the payoff is not having to write as much complex code to, e.g., adapt a system to a more limited platform, or handle changes to system libraries, or produce a more secure version with a smaller attack surface due to removing insecure functionalities. If tests can produce "productivity revenue" rather than simply allowing the detection of faults in previous productivity, they may be more valued, and receive more attention. The upshot will be more reliable systems that are also easier to adapt to resource-limited situations.
Arpit Christi, Alex Groce, and Rahul Gopinath. "Evaluating Fault Localization for Resource Adaptation via Test-based Software Modification." In IEEE International Conference on Software Quality, Reliability, and Security, Sofia, Bulgaria, July 2019.
Further heuristics and results to improve the code-reduction (delta-debugging) based approach to automatic resource adapations, part of the DARPA BRASS project.
Josselin Feist, Gustavo Grieco, and Alex Groce. "Slither: A Static Analysis Framework For Smart Contracts." In International Workshop on Emerging Trends in Software Engineering for Blockchain, Montreal, Canada, May 2019.
Presents Slither, an SSA-based, extensible, powerful static analysis tool for Solidity smart contracts.
Josie Holmes and Alex Groce. "Causal Distance-Metric-Based Assistance for Debugging After Compiler Fuzzing." In IEEE International Symposium on Software Reliability Engineering, pages 166-177, Memphis, Tennessee, October 2018.
This paper proposes a new metric for use with the FPF idea proposed in our PLDI 13 paper on taming compiler fuzzers (that is, figuring out which failing tests are the same bug for triaging). The new approach uses mutants to improve failure ranking and shows that this can also be adapted to a novel approach to fault localization, which is more limited in scope (you want multiple failing tests, ideally for multiple bugs!), but works better than MUSE, MUSEUM, and Metallaxis-FL on many benchmark examples (often with less overhead), where it applies. The use of mutants to determine causality in executions is, I think, novel to this paper, and highly interesting.
Arpit Christi, Matthew Olson, Mohammad Amin Alipour, and Alex Groce. "Reduce Before You Localize: Delta-Debugging and Spectrum-Based Fault Localization." In IEEE International Workshop on Debugging and Repair, pages 184-191, Memphis, Tennessee, October 2018.
This paper shows something simple: if you are able to apply delta-debugging to tests to be used in fault localization, do so. The improvement from reducing the size of failing tests (and, for localization purposes, throwing away the originals that likely execute much extraneous code) is often greater than switching localization formula. Reduced tests should probably also be used in evaluations of localizations.
Peter Goodman, Gustavo Grieco, and Alex Groce. "Tutorial: DeepState: Bringing Vulnerability Detection Tools into the Development Cycle." In IEEE Cybersecurity Development Conference, Boston, Massachusetts, September-October 2018.
This is an overview of the content of the SecDev tutorial on effective use of DeepState.
Arpit Christi and Alex Groce. "Target Selection for Test-Based Resource Adaptation." In IEEE International Conference on Software Quality, pages 458-469, Lisbon, Portugal, July 2018.
This paper improves the results of our previous work on test-based resource adaptation with several powerful heuristics that speed program reduction.
Alex Groce, Josie Holmes, Darko Marinov, August Shi, and Lingming Zhang. "An Extensible, Regular-Expression-Based Tool for Multi-Language Mutant Generation." In IEEE/ACM International Conference on Automated Software Engineering, pages 25-28, Gotherburg, Sweden, May-June 2018.
Mutant generation usually involves either working at the bytecode level, or parsing the language of a source file. We show that a text-based approach, using much less effort, can extend the reach of mutation to new languages, including Apple's Swift, and make creating custom mutations almost trivial.
Peter Goodman and Alex Groce. "DeepState: Symbolic Unit Testing for C and C++." In NDSS 18 Workshop on Binary Analysis Research, San Diego, California, February 2018.
The security and testing communities have produced powerful tools, including symbolic execution engines and sophisticated fuzzers. Unfortunately, not many developers know how to use these tools, learning one does not make learning another trivial, and the tools have significant limitations -- and bugs. DeepState offers a front-end to such tools that resembles a unit testing framework such as Google Test, but allows generation of inputs via a selected back-end. Right now, DeepState offers access to angr and Manticore, as well as Dr. Memory's fuzzer. In the future, we plan to extend it to other back-ends. In addition to this gain in ease of use (and ability to find a tool that works on your problem, DeepState also provides high-level heuristic aids to improve the performance of the back-ends. This makes DeepState a powerful platform for exploring novel techniques to improve test generation.
Alex Groce and Josie Holmes. "Provenance and Pseudo-Provenance for Seeded Learning-Based Automated Test Generation." In NIPS 2017 Interpretable ML Symposium, Long Beach, California, December 2017.
We propose some (common-sense) methods for tracking and presenting the provenance of tests generated by automated test generation methods where new tests essentially derive from old tests, and can (usually) be traced back to some original, often simple and hand-crafted (or at least simple and valid) inputs. The most interesting idea here is probably the proposal to "fake" a provenance in cases where either the algorithm itself or a post-processing step (like normalization) destroys it. For such tests we can generate a plausible, if "untrue" causal story of how the test derives from seed inputs, and it seems reasonable to think that such "false" stories are, in a probabilistic setting, useful even if technically untrue, since the test could have been thus derived. In fact, this suggests (not in the paper) the idea of made-up (Platonic/Straussian lie?) ML interpretations. "This is not how I produced this result, but it's a simple story your little human head can handle, and not utterly impossible" might be a reasonable thing for an AI to say, in some circumstances. What Elon Musk would think of such patronizing AIs, we don't speculate.
Arpit Christi, Alex Groce, and Rahul Gopinath. "Resource Adaptation via Test-Based Software Minimization." In IEEE International Conference on Self-Adaptive and Self-Organizing Systems, pages 61-70, Tucson, Arizona, September 2017.
What if we have really good tests? This paper proposes that labeling tests can serve as a convenient way to express adaptability constraints for software resource adaptation. The idea is to 1) label some tests as ok to fail 2) use cause reduction/delta-debugging/hierarchical delta-debugging/statement deletion mutation (whichever you wish to think of it as) to produce variants of the program that pass all tests not labeled as ok to fail. Test labeling is an easy way to express functionality, and restricting to statement deletion means that tests alone may suffice to avoid broken variants. Case study: by labeling three tests, we can remove undo/redo from NetBeans IDE and significantly reduce its memory usage.
Alex Groce, Josie Holmes, and Kevin Kellar. "One Test to Rule Them All." In ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 1-11, Santa Barbara, California, July 2017.
This paper proposes a new, term-rewriting-based approach to semantic simplification of unit tests, providing both canonization to reduce duplicate tests for faults and additional size reduction beyond delta-debugging. The paper also describes a (closely related) way to generalize a test to understand the neighborhood of failing tests, and distinguish accidental and essential aspects of a failure. The approach is well-implemented in current TSTL release, and sufficiently useful that I basically always run random testing (if searching for actual bugs, not just running experiments for a paper) with normalization turned on. It isn't quite obvious, but this paper is basically another in the series of papers (cause reduction being the previous one) concerning causality in verification/testing that extend back to my thesis work at CMU and the model-checking error explanation/"What Went Wrong?" papers.
Josie Holmes and Alex Groce. "A Suite of Tools for Making Effective Use of Automatically Generated Tests." In ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 356-359, Santa Barbara, California, July 2017.
Tool paper describing the TSTL tools for manipulating and understanding tests (reduction, normalization, generalization). A decent brief intro to the TSTL tool suite and what it can do with tests.
Alex Groce, Paul Flikkema, and Josie Holmes. "Towards Automated Composition of Heterogeneous Tests for Cyber-Physical Systems." In ACM International Workshop on Testing Embedded and Cyber-Physical Systems, pages 12-15, Santa Barbara, California, July 2017.
This paper describes a new approach to test composition, and (perhaps just as importantly) suggests that test composition is something we really do care about, and want to make efficient and effective. The proposed approach uses delta-debugging to shrink a large candidate composition that has extra copies of test components to allow for better interleaving of tests and the elimination of interfering operations.
Rahul Gopinath, Carlos Jensen, and Alex Groce. "The Theory of Composite Faults." In IEEE International Conference on Software Testing, Verification and Validation, pages 47-57, Tokyo, Japan, March 2017.
We investigate the proposition that, in general, if a test detects a fault in isolation, it will (with high probability) detect that fault when it is composed with another fault. We extend the theory somewhat, and show that in practice the "high probability" is 98-99% for Apache commons libraries.
Iftekhar Ahmed, Carlos Jensen, Alex Groce, and Paul E. McKenney. "Applying Mutation Analysis On Kernel Test Suites: An Experience Report." In International Workshop on Mutation Analysis, pages 110-115, Tokyo, Japan, March 2017.
Mutation testing can actually help improve real tests, in a practical way. Right now, you need a very high quality test suite/generation method already, and a dedicated and savvy developer (like Paul), but the principle seems clear: this is effort that can pay off, for high-stakes systems code.
Iftekhar Ahmed, Rahul Gopinath, Caius Brindescu, Alex Groce, and Carlos Jensen. "Can Testedness be Effectively Measured?" In ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 547-558, Seattle, Washington, November 2016.
Using a novel method for evaluating a "testedness" measure (does a high measure for a program component predict few future defect corrections for that component?) we show that: 1) traditional testedness measures are not, in a continuous sense, very reliable (testedness is only weakly negatively correlated with future defects), but 2) choosing a threshold (say, is a component covered or not), less "tested" components have, on average, twice as many future fixes. Upshot: the traditional use of coverage measures and mutation scores in testing research may be dubious (small differences in score don't strongly imply anything), but using coverage/mutation as a simple rule-of-thumb to prioritize/evaluate testing is reasonably well supported.
Josie Holmes, Alex Groce, and Mohammad Amin Alipour. "Mitigating (and Exploiting) Test Reduction Slippage." In 7th Workshop on Automated Software Testing (A-TEST), pages 66-69, Seattle, Washington, November 2016.
Slippage takes place when delta-debugging reduces a test T failing due to fault A into a test R failing due to fault B. While this is generally a problem (B is probably an easier fault to detect than A was), it can also be exploited to improve fault detection. We present two algorithms for both mitigating slippage and using it to find more faults, and a limited evaluation showing the utility of these methods.
Ali Aburas and Alex Groce. "A Method Dependence Relations Guided Genetic Algorithm." In International Symposium on Search-Based Software Engineering, pages 267-273, Raleigh, North Carolina, October 2016.
Mohammad Amin Alipour, August Shi, Rahul Gopinath, Darko Marinov, and Alex Groce. "Evaluating Non-Adequate Test-Case Reduction." In IEEE/ACM International Conference on Automated Software Engineering, pages 16-26, Singapore, Singapore, September 2016.
Mohammad Amin Alipour, Alex Groce, Rahul Gopinath, and Arpit Christi. "Generating Focused Random Tests Using Directed Swarm Testing." In ACM International Symposium on Software Testing and Analysis, pages 70-81, Saarbrucken, Germany, July 2016.
Rahul Gopinath, Mohammad Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. "On the Limits of Mutation Reduction Strategies." In ACM/IEEE International Conference on Software Engineering (ICSE), pages 511-522, Austin, Texas, May 2016.
Rahul Gopinath, Carlos Jensen, and Alex Groce. "Topsy-Turvy: a Smarter and Faster Parallelization of Mutation Analysis." In ACM/IEEE International Conference on Software Engineering (ICSE), pages 740-743, Austin, Texas, May 2016.
(ICSE 2016 Distinguished Poster Award)
Alex Groce, Iftekhar Ahmed, Carlos Jensen, and Paul E. McKenney. "How Verified is My Code? Falsification-Driven Verification." In IEEE/ACM International Conference on Automated Software Engineering, pages 737-748, Lincoln, Nebraska, November 2015.
Rahul Gopinath,Mohammad Amin Alipour, Iftekhar Ahmed, Carlos Jensen, and Alex Groce. "How Hard Does Mutation Analysis Have to Be, Anyway?" In IEEE International Symposium on Software Reliability Engineering, pages 216-227, Gaithersburg, Maryland, November 2015.
Alex Groce, Jervis Pinto, Pooria Azimi, and Pranjal Mittal. "TSTL: A Language and Tool for Testing (Demo)." In ACM International Symposium on Software Testing and Analysis, pages 414-417, Baltimore, Maryland, July 2015.
Alex Groce and Jervis Pinto. "A Little Language for Testing." In NASA Formal Methods Symposium, pages 204-218, Pasadena, California, April 2015.
Yuanli Pei, Arpit Christi, Xiaoli Fern, Alex Groce, and Weng-Keen Wong. "Taming a Fuzzer Using Delta Debugging Trails." In 3rd International Workshop on Software Mining, Shenzen, China, December 2014.
Rahul Gopinath, Carlos Jensen, and Alex Groce. "Mutations: How Close Are They to Real Faults?" In IEEE International Symposium on Software Reliability Engineering, pages 189-200, Naples, Italy, November 2014.
Alex Groce, Mohammad Amin Alipour, and Rahul Gopinath. "Coverage and Its Discontents." In ACM Symposium on New Ideas in Programming and Reflections on Software, Onward! Essays, part of SPLASH (ACM SIGPLAN Conference on Systems, Programming, Languages and Applications: Software for Humanity) , pages 255-268 , Portland, Oregon, October 2014.
Ali Aburas and Alex Groce. "An Improved Memetic Algorithm with Method Dependence Relations (MAMDR)." In International Conference on Quality Software , pages 11-20, Dallas, Texas , October 2014.
Chaoqiang Zhang, Alex Groce, and Amin Alipour. "Using Test Case Reduction and Prioritization to Improve Symbolic Execution." In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 60-70, San Jose, California, July 2014.
Duc Le, Mohammad Amin Alipour, Rahul Gopinath, and Alex Groce. "MuCheck: an Extensible Tool for Mutation Testing of Haskell Programs." In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 429-432, Tools and Demonstration Track, San Jose, California, July 2014.
Rahul Gopinath, Carlos Jensen, and Alex Groce. "Code Coverage for Suite Evaluation by Developers." In ACM/IEEE International Conference on Software Engineering (ICSE), pages 72-82, Hyderabad, India, May-June 2014.
Alex Groce, Amin Alipour, Chaoqiang Zhang, Yang Chen, and John Regehr. "Cause Reduction for Quick Testing." In IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 243-252, Cleveland, Ohio, March-April 2014.
(ICST 2014 Best Paper Award)
Amin Alipour, Alex Groce, Chaoqiang Zhang, Anahita Sanadaji, and Gokul Caushik. "Finding Model-Checkable Needles in Large Source Code Haystacks: Modular Bug-Finding via Static Analysis and Dynamic Invariant Discovery." In International Workshop on Constraints in Formal Verification (CFV), San Jose, California, November 2013.
Alex Groce, Chaoqiang Zhang, Amin Alipour, Eric Eide, Yang Chen, and John Regehr. "Help, Help, I'm Being Suppressed! The Significance of Suppressors in Software Testing." In IEEE International Symposium on Software Reliability Engineering (ISSRE), pages 390-399, Pasadena, California, November 2013.
Milos Gligoric, Alex Groce, Chaoqiang Zhang, Rohan Sharma, Amin Alipour, and Darko Marinov. "Comparing Non-Adequate Test Suites using Coverage Criteria." In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 302-313, Lugano, Switzerland, July 2013.
Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. "Taming Compiler Fuzzers." In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 197-208, Seattle, Washington, June 2013.
Alex Groce, Alan Fern, Jervis Pinto, Tim Bauer, Amin Alipour, Martin Erwig, and Camden Lopez. "Lightweight Automated Testing with Adaptation-Based Programming." In IEEE International Symposium on Software Reliability Engineering (ISSRE), pages 161-170, Dallas, Texas, November 2012.
Alex Groce and Martin Erwig. "Finding Common Ground: Choose, Assert, and Assume." In International Workshop on Dynamic Analysis (WODA), pages 12-17 Minneapolis, Minnesota, July 2012.
Amin Alipour and Alex Groce. "Extended Program Invariants: Applications in Testing and Fault Localization." In International Workshop on Dynamic Analysis (WODA), pages 7-11, Minneapolis, Minnesota, July 2012.
Alex Groce, Chaoqiang Zhang, Eric Eide, Yang Chen, and John Regehr. "Swarm Testing." In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 78-88, Minneapolis, Minnesota, July 2012.
Amin Alipour and Alex Groce. "Bounded Model Checking and Feature Omission Diversity." In International Workshop on Constraints in Formal Verification (CFV), San Jose, California, November 2011.
Alex Groce. "Coverage Rewarded: Test Input Generation via Adaptation-Based Programming." In ACM/IEEE International Conference on Automated Software Engineering (ASE), pages 380-383, Lawrence, Kansas, November 2011.
Amber Shinsel, Todd Kulesza, Margaret M. Burnett, William Curran, Alex Groce, Simone Stumpf, and Weng-Keen Wong. "Mini-crowdsourcing End-user Assessment of Intelligent Assistants: A Cost-benefit Study." In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 47-54, Pittsburgh, Pennsylvania, September 2011.
Todd Kulesza, Margaret Burnett, Simone Stumpf, Weng-Keen Wong, Shubhomoy Das, Alex Groce, Amber Shinsel, Forrest Bice, and Kevin McIntosh. "Where Are My Intelligent Assistant's Mistakes? A Systematic Testing Approach." In International Symposium on End-User Development (EUD), pages 171-186, Brindisi, Italy, June 2011.
Alex Groce, Klaus Havelund, and Margaret Smith. "From Scripts to Specifications: the Evolution of a Flight Software Testing Effort." In ACM/IEEE International Conference on Software Engineering (ICSE), pages 129-138, Cape Town, South Africa, May 2010.
Alex Groce. "(Quickly) Testing the Tester via Path Coverage." In Workshop on Dynamic Analysis (WODA), Chicago, Illinois, July 2009.
James H. Andrews, Alex Groce, Melissa Weston, and Ru-Gang Xu. "Random Test Run Length and Effectiveness." In IEEE/ACM Conference on Automated Software Engineering (ASE), pages 19-28, L'Aquila, Italy, September 2008.
Gerard Holzmann, Rajeev Joshi, and Alex Groce. "Tackling Large Verification Problems with the Swarm Tool." In SPIN Workshop on Model Checking of Software (SPIN), pages 134-143, Los Angeles, California, August 2008.
Klaus Havelund, Alex Groce, Gerard Holzmann, Rajeev Joshi, and Margaret Smith. "Automated Testing of Planning Models." In Workshop on Model Checking and Artificial Intelligence (MoChArt), Patras, Greece, July 2008.
Alex Groce and Rajeev Joshi. "Random Testing and Model Checking: Building a Common Framework for Nondeterministic Exploration." In Workshop on Dynamic Analysis (WODA), pages 22-28, Seattle, Washington, July 2008.
Alex Groce and Rajeev Joshi. "Extending Model Checking with Dynamic Analysis." In Conference on Verification, Model Checking and Abstract Interpretation (VMCAI), pages 142-156, San Francisco, California, January 2008.
Nicolas Blanc, Alex Groce, and Daniel Kroening. "Verifying C++ with STL Containers via Predicate Abstraction." In IEEE/ACM Conference on Automated Software Engineering (ASE), pages 521-524, Atlanta, Georgia, November 2007.
Alex Groce, Gerard Holzmann, and Rajeev Joshi. "Randomized Differential Testing as a Prelude to Formal Verification." In ACM/IEEE International Conference on Software Engineering (ICSE), pages 621-631, Minneapolis, Minnesota, May 2007.
Alex Groce and Rajeev Joshi. "Exploiting Traces in Program Analysis." In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 379-393, Vienna, Austria, March-April 2006.
Daniel Kroening, Alex Groce, and Edmund Clarke. "Counterexample Guided Abstraction Refinement via Program Execution." In International Conference on Formal Engineering Methods (ICFEM), pages 224-238, Seattle, Washington, November 2004.
Sagar Chaki, Alex Groce, and Ofer Strichman. "Explaining Abstract Counterexamples." In Foundations of Software Engineering (SIGSOFT FSE), pages 73-82, Newport Beach, California, October-November 2004.
Alex Groce and Daniel Kroening. "Making the Most of BMC Counterexamples." In Workshop on Bounded Model Checking (BMC), pages 71-84, Boston, Massachusetts, July 2004.
Alex Groce, Daniel Kroening, and Flavio Lerda. "Understanding Counterexamples with explain." In Computer Aided Verification (CAV), pages 453-456, Boston, Massachusetts, July 2004.
Alex Groce. "Error Explanation with Distance Metrics." In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 108-122, Barcelona, Spain, March-April 2004.
E. Mota, E. Clarke, W. Oliveira, A. Groce, J. Kanda, and M. Falcao. "VeriAgent: an Approach to Integrating UML and Formal Verification Tools." In Sixth Brazilian Workshop on Formal Methods (WMF 2003), pages 111-129 (Elect. Notes Theor. Comput. Sci. 95), Universidade Federal de Campina Grande, Brazil, October 2003.
Sagar Chaki, Edmund Clarke, Alex Groce, and Ofer Strichman. "Predicate Abstraction with Minimum Predicates." In Advanced Research Working Conference on Correct Hardware Design and Verification Methods (CHARME), pages 19-34, L'Aquila, Italy, October 2003.
Alex Groce and Willem Visser. "What Went Wrong: Explaining Counterexamples." In SPIN Workshop on Model Checking of Software, pages 121-135, Portland, Oregon, May 2003.
Sagar Chaki, Edmund Clarke, Alex Groce, Somesh Jha, and Helmut Veith. "Modular Verification of Software Components in C." In ACM/IEEE International Conference on Software Engineering (ICSE), pages 385-395, Portland, Oregon, May 2003.
(ICSE 2003 ACM SIGSOFT Distinguished Paper Award)
Alex Groce, Doron Peled, and Mihalis Yannakakis. "AMC: An Adaptive Model Checker." In Computer Aided Verification (CAV), pages 521-525, Copenhagen, Denmark, July 2002.
Alex Groce and Willem Visser. "Model Checking Java Programs using Structural Heuristics." In International Symposium on Software Testing and Analysis (ISSTA), pages 12-21, Rome, Italy, July 2002.
Alex Groce and Willem Visser. "Heuristic Model Checking for Java Programs." In SPIN Workshop on Model Checking of Software, pages 242-245, Grenoble, France, April 2002.
Alex Groce, Doron Peled, and Mihalis Yannakakis. "Adaptive Model Checking." In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 357-370, Grenoble, France, April 2002.
Girish Bhat, Rance Cleaveland, and Alex Groce. "Efficient Model Checking Via Büchi Tableau Automata." In Computer Aided Verification (CAV), pages 38-52, Paris, France, July 2001.
Alex Groce, Alan Fern, Martin Erwig, Jervis Pinto, Tim Bauer, and Amin Alipour. "Learning-Based Test Programming for Programmers." In International Symposium On Leveraging Applications of Formal Methods, Verification and Validation, Heraclion, Crete, October 2012.
Gerard Holzmann, Rajeev Joshi, and Alex Groce. "Model Driven Code Checking." In Automated Software Engineering, Special Issue on Trends in Automated Software Engineering, Volume 15, Number 3-4, pages 283-297, December 2008.
Gerard Holzmann, Rajeev Joshi, and Alex Groce. "Swarm Verification." In Conference on Automated Software Engineering (ASE), pages 1-6, L'Aquila, Italy, September 2008.
Alex Groce, Gerard Holzmann, Rajeev Joshi, and Ru-Gang Xu. "Putting Flight Software Through the Paces with Testing, Model Checking, and Constraint-Solving." In Workshop on Constraints in Formal Verification, pages 1-15, Sydney, Australia, August 2008.
Gerard Holzmann, Rajeev Joshi, and Alex Groce. "New Challenges in Model Checking." In Symposium on 25 Years of Model Checking, Seattle, Washington, August 2006.
Duc Le, Mohammad Amin Alipour, Rahul Gopinath, and Alex Groce. "Mutation Testing of Functional Programming Languages." Oregon State University Technical Report, 2014.
James H. Andrews, Yihao Zhang, and Alex Groce. "Comparing Automated Unit Testing Strategies." Department of Computer Science, University of Western Ontario, Technical Report No. 736, 2010.
Nicolas Blanc, Daniel Kroening, and Alex Groce. "Verifying C++ with STL Containers via Predicate Abstraction." Technical Reports 506, ETZ Zurich, January 2006.
Alex Groce. "Error Explanation and Fault Localization with Distance Metrics." Technical Report CMU-CS-05-121, Carnegie Mellon University, March 2005.
Alex Groce and Willem Visser. "What Went Wrong: Explaining Counterexamples." Technical Report 02-08, RIACS, USRA, February 2002.