View Full Version : Finding Bugs in VMs with a Theorem Prover, Round 1

February 17th, 2012, 22:19
Here's some work I did last fall.

Equivalence checking is a popular technique in the academic literature and in industrial practice, used to verify that two "things" are equivalent. Commonly, it involves translating those things into some logic that is supported by a theorem prover, and asking the decision procedure to prove whether the two things must always be equal. Equivalently, we ask whether there exists any circumstances where the two things might not be equal. If the two things must always be equal under all circumstances, then a complete decision procedure will eventually report that the formula is unsatisfiable. If they can ever deviate, then it will return a "counterexample" illustrating a scenario in which the things differ. The decision procedure may not terminate in a reasonable amount of time, or do so at all if the logic is undecidable or the decision procedure is incomplete.

Equivalence checking is commonly found in the hardware world (particularly, in a field known as Electronic Design Automation, or EDA), but it has been applied to software as well, for tasks such as verification that compiler optimizations have been implemented correctly, verification of cryptographic implementations, and other forms of specification-conformance. In this blog, we'll take a look at an application of equivalence checking in deobfuscation.

Equivalence checking

We begin by giving an example proving the equivalence between an x86 sequence and a machine instruction.

Bit-level hackers -- those who study the works of the original MIT "hackers", whose work bequeathed the term "hacker" -- will know the following example regarding taking the population count of a 32-bit integer. The "population count" of a given integer is the number of one-bits set in the integer. This operation is known as "popcnt", and the x86 instruction set features an instruction of the same name, whose syntax is "popcnt r32, r/m32" and whose semantics dictate that the population count of the right-hand side (32- or 16-bit, depending upon encoding) be deposited into the location specified by the left-hand side. We can easily imagine implementing this functionality with a loop: