OpenRCE_RolfRolles

December 4th, 2009, 03:33

There are many books devoted to languages like C++ or Python, and technical blogs are teeming with further information on these subjects. Languages like OCaml have much smaller communities, with very few dedicated books and technical blogs. As far as I've put together, we OCaml programmers mainly learn from reading one anothers' code. Functional programmers must stick together.

VinE is the static analysis component of the BitBlaze binary analysis project, it is written in OCaml, and it contains some tricks that I haven't seen elsewhere. For the sake of contributing to their dissemination throughout the community, I would like to detail some of them here. These might be obvious to everyone already, or maybe no one cares. Feedback is welcome.

The following discussion concerns /ocaml/wp.ml, the weakest precondition calculator. A weakest precondition (hereinafter WP) is the most general condition under which execution reaches some given point with some desired post-condition (i.e. an execution trace plus some arbitrary condition that must hold at the end of the trace, e.g. "this array write at this location goes out of bounds". WPs compare favorably to forward symbolic execution in terms of formula size, and comprise an attractive choice for enacting the primitive "cause execution to reach a particular line in a particular state", if loop unrolling is palatable. This entry concerns an OCaml programming technique found in VinE's WP calculator, and does not seek to discuss WPs themselves. For further reading, read a book on formal semantics (the section on axiomatic semantics) or Dijkstra's original work in the area from the early 1970s.

One issue with functional programming is that, if recursive functions are not written in a tail-recursive style, they have the potential to overflow the stack, thereby causing the program to abort. One learns to be judicious about writing tail-recursive functions, but alas, it is impossible to naively convert some functions into such a form; care is needed. For example, if the recursive function needs to call itself to produce a value, and then perform some operation on the result before returning it, then the recursive invocation is not in the tail position and cannot be optimized into a jump by the compiler. Therefore, the function will consume stack space for each recursive invocation, leading to an upper bound on the size of the data that can be processed before a stack overflow exception is thrown.

Consider the following snippet, which is the aforementioned weakest precondition calculator written as a straightforward recursive pattern match:

In the case of the Choice statement, the weakest precondition for both choices (variables r1 and r2) must be calculated before they can be conjoined, and therefore these calls can not be made tail-recursive. Seq is similar. How do we write this function so it does not crash in the presence of large data sets?

One answer is "explicit continuation-passing style", as seen in the following snippet:

We have introduced a new argument to wp called "k", the "continuation", which means "the rest of the computation". We can see that the calls to wp in the Choice and Seq cases are now in the tail position, and hence can be optimized directly into jumps, resulting in constant stack depth and no more crashes. The cases other than Choice and Seq have been modified to apply the continuation to the result of their portion of the weakest precondition calculation.

But what exactly is going on here? The compact representation yields little insight into how the WP is actually calculated step-by-step. Thus, let's take a look at an example GCL statement and the computation of its WP:

The first three steps show the expansion of the continuation due to the Seq statements in the input; the rest of the steps show the gradual shrinkining of the continuation by processing the statements inside of the Seq statements. Essentially, we have traded stack space for the activation records of wp for heap space for the continuation, and in the process have produced a tail-recursive function that does not consume stack space.

From here, we pass the propositional formula (a VinE expression) off to a theorem prover and receive an input that causes execution to reach the specified state with the specified condition. However, this is once again beyond the scope of this entry.

https://www.openrce.org/blog/view/1523/VinE's_OCaml_Programming_Tricks:__Explicit_Continuation-Passing_Style

VinE is the static analysis component of the BitBlaze binary analysis project, it is written in OCaml, and it contains some tricks that I haven't seen elsewhere. For the sake of contributing to their dissemination throughout the community, I would like to detail some of them here. These might be obvious to everyone already, or maybe no one cares. Feedback is welcome.

The following discussion concerns /ocaml/wp.ml, the weakest precondition calculator. A weakest precondition (hereinafter WP) is the most general condition under which execution reaches some given point with some desired post-condition (i.e. an execution trace plus some arbitrary condition that must hold at the end of the trace, e.g. "this array write at this location goes out of bounds". WPs compare favorably to forward symbolic execution in terms of formula size, and comprise an attractive choice for enacting the primitive "cause execution to reach a particular line in a particular state", if loop unrolling is palatable. This entry concerns an OCaml programming technique found in VinE's WP calculator, and does not seek to discuss WPs themselves. For further reading, read a book on formal semantics (the section on axiomatic semantics) or Dijkstra's original work in the area from the early 1970s.

One issue with functional programming is that, if recursive functions are not written in a tail-recursive style, they have the potential to overflow the stack, thereby causing the program to abort. One learns to be judicious about writing tail-recursive functions, but alas, it is impossible to naively convert some functions into such a form; care is needed. For example, if the recursive function needs to call itself to produce a value, and then perform some operation on the result before returning it, then the recursive invocation is not in the tail position and cannot be optimized into a jump by the compiler. Therefore, the function will consume stack space for each recursive invocation, leading to an upper bound on the size of the data that can be processed before a stack overflow exception is thrown.

Consider the following snippet, which is the aforementioned weakest precondition calculator written as a straightforward recursive pattern match:

Code:

let rec wp q = function

| Skip -> q

| Assume e -> exp_implies e q

| Choice(s1, s2) -> let r1 = wp q s2 in

let r2 = wp q s1 in

exp_and r1 r2

| Seq(s1, s2) -> let r1 = wp q s2 in

wp r1 s1

| Assign(t, e) -> Let(t, e, q)

| Assert e -> exp_and e q

In the case of the Choice statement, the weakest precondition for both choices (variables r1 and r2) must be calculated before they can be conjoined, and therefore these calls can not be made tail-recursive. Seq is similar. How do we write this function so it does not crash in the presence of large data sets?

One answer is "explicit continuation-passing style", as seen in the following snippet:

Code:

let calculate_wp (simp:Gcl.exp->Gcl.exp) (post:Gcl.exp) (stmt:Gcl.t) =

let rec wp q s k =

match s with

| Skip -> k q

| Assume e -> k (simp(exp_implies e q))

| Choice(s1, s2) -> wp q s1 (fun x -> wp q s2 (fun y -> k(simp(exp_and x y ))))

| Seq(s1, s2) -> wp q s2 (fun x -> wp x s1 k)

| Assign(t, e) -> k(simp(Let(t, e, q)))

| Assert e -> k (simp(exp_and e q))

in

wp post stmt (fun x->x)

We have introduced a new argument to wp called "k", the "continuation", which means "the rest of the computation". We can see that the calls to wp in the Choice and Seq cases are now in the tail position, and hence can be optimized directly into jumps, resulting in constant stack depth and no more crashes. The cases other than Choice and Seq have been modified to apply the continuation to the result of their portion of the weakest precondition calculation.

But what exactly is going on here? The compact representation yields little insight into how the WP is actually calculated step-by-step. Thus, let's take a look at an example GCL statement and the computation of its WP:

Code:

wp q Seq(Skip,Seq(Assume expr1,Assert expr2)) (fun x -> x) (* Initial call to wp with arbitrary post-condition q *)

wp q (Seq(Assume expr1,Assert expr2)) (fun x -> wp x Skip (fun x->x)) (* Grow the continuation because of the Seq *)

wp q (Assert expr2) (fun x -> wp x (Assume expr1) (fun x -> wp x Skip (fun x->x))) (* Grow the continuation because of the Seq *)

(fun x -> wp x (Assume expr1) (fun x -> wp x Skip (fun x->x))) (exp_and expr2 q) (* Shrink the continuation to process the assert *)

wp (exp_and expr2 q) (Assume expr1) (fun x -> wp x Skip (fun x->x)) (* Process the assume *)

(fun x -> wp x Skip (fun x->x)) (exp_implies expr1 (exp_and expr2 q)) (* Shrink the continuation to because of the assume *)

wp (exp_implies expr1 (exp_and expr2 q)) Skip (fun x->x) (* Process the skip *)

(fun x -> x) (exp_implies expr1 (exp_and expr2 q)) (* Shrink the continuation because of the skip *)

(exp_implies expr1 (exp_and expr2 q)) (* Final WP *)

The first three steps show the expansion of the continuation due to the Seq statements in the input; the rest of the steps show the gradual shrinkining of the continuation by processing the statements inside of the Seq statements. Essentially, we have traded stack space for the activation records of wp for heap space for the continuation, and in the process have produced a tail-recursive function that does not consume stack space.

From here, we pass the propositional formula (a VinE expression) off to a theorem prover and receive an input that causes execution to reach the specified state with the specified condition. However, this is once again beyond the scope of this entry.

https://www.openrce.org/blog/view/1523/VinE's_OCaml_Programming_Tricks:__Explicit_Continuation-Passing_Style