notesassorted ramblings on computer

Symbolic Execution

Symbolic execution is a program analysis technique that enumerates execution paths through a software under test. Conceptually, program execution can divert at branch points (i.e., conditional jumps) based on a branch condition. Consequentially, an execution path is a sequence of Boolean values specifying which branch was taken, for every branch point.

To enumerate all reachable execution paths, symbolic execution reasons about the satisfiability of branch conditions. For this purpose, it transforms the branch conditions to logical formulas and queries a constraint solver for satisfiability of these formulas. Specifically, it checks for both the satisfiability of the true- and false case and explores all cases that are satisfiable.

For continued exploration, it tracks the currently enforced path condition (PC). For example, after taking the true-branch for a > 5 this logical condition must be enforced for all subsequent branches. As such, a subsequent branch condition like a ≤ 5 would always evaluate to false and hence an execution path triggering the true-case on both is not reachable.

Example

Consider the following C code:

int myfunc(int a) {
	int ret = 0;

	if (a > 8)
		ret = a - 7;
	if (a < 5)
		ret = a - 2;

	return ret;
}

In total, myfunc has four execution paths.1 However, only three of them are actually reachable. This becomes apparent when considering the following binary tree visualization of all execution paths:

Since (a > 8) ∧ (a < 5) evaluates to false, the execution path where both branches conditions evaluate to true is not reachable.

Satisfiability Modulo Theories

To determine the satisfiability of logical formulas, such as (a > 8) ∧ (a < 5), symbolic execution uses a constraint solver. In our example, the branch conditions are relatively simple but generally they contain arbitrary arithmetic and logic operations. To formally reason about such operations, symbolic execution commonly relies on Satisfiability Modulo Theories (SMT). Conceptually, SMT extends the SAT problem with logical formulas supporting more complex data types (e.g., bitvectors) and operations on these types (e.g., two’s complement addition).2

SMT formulas are commonly expressed in the SMT-LIB format, which is a standardized input format consumed by many SMT solvers. Symbolic execution thus needs to encode branch conditions from the software under test in the SMT-LIB format. To abstract from the semantics of specific programming language, this encoding step is usually performed based on an intermediate representation of the software under test.

The following material provides a more detailed introduction:


  1. The amount of execution paths grows exponentially with the number of branches in the tested code, hence our example has 2² paths.↩︎

  2. SAT only reasons about satisfiability for Boolean values.↩︎