Geometric Encoding based Context Sensitive Points-to Analysis for Java

Xiao, Xiao and Charles, Zhang


Installation

The source code patch against the soot 2.4 can be downloaded here.

Suppose now you are at $HOME, the directory of your soot 2.4 is soot2.4 and the patch file is geom.patch. Please type the following commands to install this patch:

#cd soot2.4

#patch -p1 < ../geom.patch

Because we not only add new packages to soot, but also change some files of soot itself. Therefore, make sure the following files are not modifed by your project:

soot.options.Options
soot.options.SparkOptions
soot.jimple.toolkits.callgraph.CallGraph
soot.jimple.toolkits.callgraph.Edge
soot.Kind
soot.SootClass
soot.SootMethod
soot.FastHierarchy
soot.toolkits.scalar.Pair
soot.util.ArrayNumberer
soot.util.IterableNumberer
soot.util.Numberer
soot.jimple.spark.builder.ContextInsensitiveBuilder
soot.jimple.spark.pag.AllocNode.java
soot.jimple.spark.pag.ArrayElement.java
soot.jimple.spark.pag.GlobalVarNode.java
soot.jimple.spark.pag.Node.java
soot.jimple.spark.pag.PAG.java
soot.jimple.spark.pag.Parm.java
soot.jimple.spark.pag.SparkField.java
soot.jimple.spark.pag.VarNode.java
soot.jimple.spark.solver.PropWorklist.java
soot.jimple.spark.SparkTransformer.java

Otherwise, please use the clean soot 2.4 to install our patch.

Since we also submitted the patch to the soot committee, we expect our code will be shipped with the next version of Soot.


New Options Provided by Our Patch

We add 10 options to the phase SPARK (maybe too many, :<):

1. -p cg.spark geom-pta (default: false)
This switch enables/disables the geometric analysis.

2. -p cg.spark geom-encoding (default: Geom)
This switch specifies the encoding methodology used in the analysis. All possible options are: Geom, HeapIns, PtIns. The efficiency order is Geom < HeapIns < PtIns, but the precision order is the reverse.

3. -p cg.spark geom-worklist (default: PQ)
It specifies the worklist used for selecting the next propagation pointer. All possible options are: PQ, FIFO. They stand for the priority queue (sorted by the last fire time and topology order) and FIFO queue.

4. -p cg.spark geom-dump-verbose (default: empty string)
If you want to persist the detailed execution information for future analysis, please provide a file name.

5. -p cg.spark geom-verify-name (default: empty string)
If you want to compare the precision of the points-to results with other solvers (e.g. Paddle), you can use the "verfiy-file" to specify the list of methods that are reachable by that solver. Then, in the internal evaluations (see the switch geom-eval), we only consider the methods that are present to both solvers.

6. -p cg.spark geom-eval (default: 0)
We internally provide some precision evaluation methodologies, and classify their strength into levels:
* If level is 0, we do nothing.
* If level is 1, we only report basic information about the points-to result.
* If level is 2, we perform the virtual callsite resolution, static cast safety and all alias pairs evaluations.

7. -p cg.spark geom-trans (default: false)
If your work only concern the context insensitive points-to information, you can use this option to transform the context sensitive result to insensitive result. Or, sometimes the application code may directly accesses to the points-to vector other than using the standard querying interface, we can use this option to guarantee the correct behavior. After the transformation, the context sensitive points-to result is cleared in order to save memory space for your other jobs.

8. -p cg.spark geom-frac-base (default: 40)
This option specifies fractional parameter. This parameter is used to manually tune the precision and performance trade-off. The smaller the value, the better the performance but the worse the precision.

9. -p cg.spark geom-blocking (default: true)
When this option is on, we perform the blocking strategy to the recursive calls. This strategy significantly improves the precision. The details are presented in our paper.

10. -p cg.spark geom-runs (default: 1)
We can run multiple times of the geometric analysis to continuously improve the analysis precision.

 

We also implemented the functions to dump the points-to result in various formats. This, I believe, can facilitates the further analysis work outside Soot. However, I do not provide any options at this time to control the dump behaviour. The code is given in the class soot.jimple.spark.geomPA.PointsToDumper, and you can call them directly in your project.

To quickly taste our patch, please add "-p cg.spark geom-pa:true" to your existing project. As always, the context sensitive points-to analysis is memory hungry. Thus, it's better to run your code with the JDK version 1.4 or lower. Otherwise, please run it on a machine with enough memory.


Licence and Credits

Our code is released under the LGPL licence. You can privatize the code in your commercial software. But, since this is only a research prototype, PLEASE USE IT AT YOUR OWN RISK.

This work is still ongoing, and the main resercher is Richard Xiao. Please contact him to report bug or disscuss code if you wish.