A practical algorithm for learning disjunctive abstraction heuristics in static program analysis

The precision and cost of static analysis are determined by abstraction heuristics (e.g., strategies for abstracting calling contexts, heap locations, etc.), but manually designing effective abstraction heuristics requires a huge amount of engineering effort and domain knowledge. Recently, data-driv...

Full description

Saved in:

Bibliographic Details
Published in	Information and software technology Vol. 135; p. 106564
Main Authors	Jeon, Donghoon, Jeon, Minseok, Oh, Hakjoo
Format	Journal Article
Language	English
Published	Elsevier B.V 01.07.2021
Subjects	Data-driven static analysis Learning algorithm Data-driven static analysis Learning algorithm
Online Access	Get full text
ISSN	0950-5849 1873-6025
DOI	10.1016/j.infsof.2021.106564

Cover

More Information
Summary:	The precision and cost of static analysis are determined by abstraction heuristics (e.g., strategies for abstracting calling contexts, heap locations, etc.), but manually designing effective abstraction heuristics requires a huge amount of engineering effort and domain knowledge. Recently, data-driven static analysis has emerged to address this challenge by learning such heuristics automatically from a set of training programs. We present a practical algorithm for learning disjunctive abstraction heuristics in data-driven static analysis. We build on a recently proposed approach that can learn nontrivial program properties by disjunctive boolean functions. However, the existing approach is practically limited as it assumes that the most precise abstraction is cheap for the training programs; the algorithm is inapplicable if the most precise abstraction is not scalable. The objective of this paper is to mitigate this limitation. Our algorithm overcomes the limitation with two new ideas. It systematically decomposes the learning problem into feasible subproblems, and it can search through the abstraction space from the coarse- to fine-grained abstractions. With this approach, our algorithm is able to learn heuristics when static analysis with the most precise abstraction is not scalable over the training programs. We show our approach is effective and generally applicable. We applied our approach to a context-sensitive points-to analysis for Java and a flow-sensitive interval analysis for C. Experimental results show that our algorithm is efficient. For example, our algorithm can learn heuristics for 3-object-sensitive analysis for which the existing learning algorithm is too expensive to learn any useful heuristics. Our algorithm makes a state-of-the-art technique for data-driven static analysis more practical.
ISSN:	0950-5849 1873-6025
DOI:	10.1016/j.infsof.2021.106564