Scalable Symbolic Analysis of Large Real World Programs Using API Abstraction and Execution Histories

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Scalable Symbolic Analysis of Large Real World Programs Using API 
Abstraction and Execution Histories"

By

Mr. Yueqi LI


Abstract

This thesis deals with two important problems of symbolic analysis. The first 
one is path explosion problem. Path explosion is a major issue in applying 
path-sensitive symbolic analysis to large programs. Since real world programs 
are typically built on top of many library functions, it is generally 
impractical to model all APIs and symbolic analysis often approximates some 
APIs' behavior. We observe that many symbolic states generated by such symbolic 
analysis of a procedure are indistinguishable to its callers. It is, therefore, 
possible to keep only one state from each set of equivalent symbolic states 
without affecting the analysis result. Based on this observation, we propose an 
equivalence relation called z-equivalence, which is weaker than logical 
equivalence to relate a large number of z-equivalent states. We prove that 
z-equivalence is strong enough to guarantee that paths to be traversed by the 
symbolic analysis of two z-equivalent states are identical, giving the same 
solutions to satisfiability and validity queries. We propose a sound linear 
algorithm to detect z-equivalence. Our experiments show that the symbolic 
analysis that leverages z-equivalence is able to achieve more than ten orders 
of magnitude reduction in terms of search space. The reduction significantly 
alleviates the path explosion problem, enabling us to apply symbolic analysis 
in large programs such as Hadoop and Linux Kernel.

The second problem is precision issue. API approximation can induce many 
unreachable symbolic states, which are expensive to validate manually. In the 
second part of this thesis, we propose a static approach to automatically 
validating the reported anomalous symbolic states. The validation makes use of 
the available runtime data of the un-modeled APIs collected from previous 
program executions. We show that the symbolic state validation problem can be 
cast as a MAX-SAT problem and solved by existing constraint solvers. Our 
technique found 80 unreported bugs when it was applied to 10 popular programs 
with a total of 1.5 million lines of code. All of them can be confirmed by test 
cases. Our technique presents a promising way to apply the big data paradigm to 
software engineering. It provides a mechanism to validate the symbolic states 
of a project by leveraging the many concrete input-output values of APIs 
collected from other projects.


Date:			Wednesday, 4 March 2015

Time:			5:00pm - 7:00pm

Venue:			Room 3494
 			Lifts 25/26

Chairman:		Prof.  Weichuan Yu (ECE)

Committee Members:	Prof. Shing-Chi Cheung (Supervisor)
 			Prof. Sunghun Kim
 			Prof. Charles Zhang
 			Prof. Wai-Ho Mow (ECE)
 			Prof. Michael Lyu (Comp. Sci & Eng., CUHK)


**** ALL are Welcome ****