Efficient Algorithms for Context Sensitive Points-to Analysis

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Efficient Algorithms for Context Sensitive Points-to Analysis"

By

Mr. Xiao XIAO


Abstract

Context sensitive points-to analysis, while significantly benefiting many 
static analysis techniques, is known to be difficult to scale to large 
programs. To make context sensitive points-to analysis practical to 
analyze contemporary software, which is commonly going up to millions 
lines of code, this dissertation explores the use of encoding techniques 
to speedup the computation and querying of points-to information. The 
first part of this dissertation introduces a novel technique, geometric 
encoding, to effectively capture the redundancy in representing a large 
number of calling contexts. Geometric encoding is capable of evaluating 
contexts of points-to constraints in the compressed form, but incurring 
much less space and time consumption compared to other compressing 
techniques such as BDD. Based on geometric encoding, an efficient context 
sensitive points-to engine GeomPTA is designed and implemented. GeomPTA 
can analyze large Java programs with JDK 1.6 library 7.1x – 81.9x faster 
than the worklist based 1-object-sensitive analysis in Paddle, and 
meanwhile its precision is comparable or better than 1-object-sensitive 
analysis. Our implementation of GeomPTA is now part of the official 
distribution of Soot, which is a widely used framework for analyzing Java 
programs.

Since GeomPTA is fully context sensitive, GeomPTA can answer k-CFA 
points-to queries for any constant k. This dissertation also describes a 
novel contexts-go-by query, where the specified call edge is not 
restricted to the last k call edges ascending from the enclosing function 
of the querying pointer. This new query can effectively identify 92.4% 
redundant instrumentations for a call trace profiling tool and reduce its 
runtime overhead by 7.2%. Both the k-CFA and contexts-go-by queries can be 
efficiently answered with the help of geometric encoding.

Although GeomPTA is much faster than other points-to algorithms, analyzing 
large programs still needs tens of minutes and sometimes the memory usage 
exceeds the capacity of a commodity machine. Moreover, some queries, 
especially those require aliasing information, cannot be answered 
efficiently merely with the points-to information. Therefore, the second 
part of this dissertation describes Pestrie, a persistent technique for 
compressing and reusing points-to and aliasing information. Also, the 
persistent information is structured for efficiently answering queries. 
Empirically, Pestrie produces 10.5x – 17.5x smaller persistent files and 
is 2.9x – 123.6x faster than points-to results intersection based approach 
to answer aliasing queries.


Date:			Monday, 4 January 2016

Time:			2:00p.m. - 4:00p.m.

Venue:			Room 3584
 			Lifts 27/28

Chairman:		Prof. Matthew McKay (ECE)

Committee Members:	Prof. Charles Zhang (Supervisor)
 			Prof. Shing-Chi Cheung
 			Prof. Fangzhen Lin
 			Prof. Jiang Xu (ECE)
 			Prof. Thomas Reps (Comp. Sci., U. of Wisconsin)


**** ALL are Welcome ****