Precise and Scalable Static Bug Finding for Industrial-Sized Code

PhD Thesis Proposal Defence


Title: "Precise and Scalable Static Bug Finding for Industrial-Sized Code"

by

Mr. Qingkai SHI


Abstract:

Software bugs cost developers and software companies a great deal of time 
and money. For instance, the Heartbleed bug discovered in 2014 affected 
around 500,000 websites and hundreds of products from Cisco and Juniper. 
To tame such beasts hidden in software, developers from industry usually 
utilizes static bug-finding tools to check possible bugs before a product 
is released. Although previous work has reported many success stories for 
using static bug-finding tools, we still cannot have the cake and eat it. 
In practice, it is still difficult to find bugs hidden behind 
sophisticated pointer operations, deep calling contexts, and complex path 
conditions with low false positive rates, while sieving through millions 
of lines of code in just a few hours.

In this thesis, we present our novel designs of sparse value-flow analysis 
to tackle a wide range of software bugs caused by improper value flows. 
The first problem we try to address is referred to as the "pointer trap": 
a precise points-to analysis limits the scalability of sparse value-flow 
analysis and an imprecise one seriously undermines its precision. To this 
end, we present Pinpoint, a holistic approach that decomposes the cost of 
high-precision points-to analysis by precisely discovering local data 
dependence and delaying the expensive inter-procedural analysis through 
memorization. Such memorization enables the on-demand slicing of only the 
necessary inter-procedural data dependence and path feasibility queries, 
which are then solved by a costly SMT solver. Experiments show that 
Pinpoint can check programs such as MySQL (around 2 million lines of code) 
within 1.5 hours. The overall false positive rate is also very low (14.3% 
- 23.6%). Pinpoint has discovered over a hundred real bugs in mature and 
extensively checked open source systems.

The second problem addressed in the thesis is how to improve the parallelism of 
static bug finding over the conventional parallel designs. Conventionally, 
bottom-up program analysis has been easy to parallelize because functions 
without caller-callee relations can be analyzed independently. However, such 
function-level parallelism is significantly limited by the calling dependence 
-- functions with caller-callee relations have to be analyzed sequentially 
because the analysis of a function depends on the analysis results, a.k.a., 
function summaries, of its callees. We observe that the calling dependence can 
be relaxed in many cases and, as a result, the parallelism can be improved. We 
present Cheetah, a framework of bottom-up data flow analysis, in which the 
analysis task of each function is elaborately partitioned into multiple 
sub-tasks to generate pipelineable function summaries. These sub-tasks are 
pipelined and run in parallel without any additional synchronization, even 
though the calling dependence exists. We formalize our idea under the IFDS/IDE 
framework and discuss its application to sparse value-flow analysis. We 
evaluate Cheetah on a series of standard benchmark programs and open-source 
projects, which demonstrates significant speedup over a conventional parallel 
design.


Date:			Friday, 18 October 2019

Time:                  	2:00pm - 4:00pm

Venue:                  Room 5501
                         lifts 25/26

Committee Members:	Dr. Charles Zhang (Supervisor)
 			Prof. Shing-Chi Cheung (Chairperson)
 			Prof. Cunsheng Ding
 			Dr. Shuai Wang


**** ALL are Welcome ****