Scalable Typestate Verification for Millions of Lines of Code

PhD Thesis Proposal Defence


Title: "Scalable Typestate Verification for Millions of Lines of Code"

by

Mr. Gang FAN


Abstract:

Applying static code analysis in the industry involves both technical and 
management problems. In this thesis, we address new challenges based on 
our experience of applying static code analysis in several large software 
companies and propose solutions for some of them. When the code base grows 
large, both the precision and scalability of static analysis cannot be 
compromised. We first present SMOKE, which is designed for optimizing the 
performance for analyzing typestate problems (e.g. memory leak detection), 
that uses the staged design together with sparse analysis techniques. 
Instead of using a uniform precise analysis for all program paths, in the 
first stage, we use a scalable but imprecise analysis to compute a 
succinct set of candidate error paths. In the second stage, we leverage a 
more precise analysis to verify the feasibility of the candidates. Our 
first stage analysis is scalable, due to the design of a new sparse 
program representation, namely the use-flow graph (UFG), which enables us 
to model the problem as a polynomial-time state analysis. Our second stage 
analysis is precise and still efficient, due to the smaller number of 
candidates and the design of a dedicated constraint solver. Experimental 
results demonstrated that SMOKE can finish checking industrial-sized 
projects, up to 8MLoC, in forty minutes with an average false positive 
rate of 24.4Despite the technical problems, for continuous using and 
improving static analysis in practice, we also need to solve non-technical 
problems such as the dilemma between the need for code to improve static 
analysis tools and the confidentiality requirements of tool users. We 
abstract an optimization cycle from our experiences, which is more 
acceptable for both parties. This cycle addresses the trust problem by 
introducing an intermediate document containing only code segments that 
are sufficient for improving static analysis but cannot reveal sensitive 
information such as the whole program source code.


Date:			Friday, 3 May 2019

Time:                  	10:30am - 12:30pm

Venue:                  Room 4475
                         lifts 25/26

Committee Members:	Dr. Charles Zhang (Supervisor)
 			Dr. Tao Wang (Chairperson)
 			Prof. Shing-Chi Cheung
 			Dr. Qiong Luo


**** ALL are Welcome ****