notesassorted ramblings on computer

A Description of KLEE’s KTest Format

KTest is the format used by the KLEE symbolic execution engine to describe generated test inputs. The format is very simple but (as far as I know) not documented outside the KLEE source code. However, the format is also useful for other test generator tools because they can:

  1. Reuse the existing tooling provided by KLEE.
  2. Utilize inputs generated by KLEE, or run KLEE with their generated inputs.

This is an attempt to document this format, both in a human-readable and machine-readable form. Ideally, contributing to further spread and utilization of the KTest format. The formal, machine-readable description leverages Kaitai Struct to describe the format using YAML and can be used to automatically generate parsers for different programming languages.

The Format

The KTest format is a binary format, encoded in big-endian byte order. The format defines the following fields:

  1. Header:
    1. The magic ASCII value ‘KLEE’.
    2. The version number (see below).
  2. Arguments: List of command-line arguments passed to the test program.
    1. The number of arguments as an unsigned 32-bit integer.
    2. The specified amount of arguments, each represented as a string.
  3. symArgvs: Don’t know what this does, ktest-tool doesn’t use it.
  4. symArgvLen: Don’t know what this does, ktest-tool doesn’t use it.
  5. Objects: List of test inputs.
    1. The amount of objects as an unsigned 32-bit integer.
    2. The specified amount of objects (see below).

Objects are encoded as follows:

  1. The name of the object, represented as a string.
  2. The size of the object’s value in bytes, encoded as a 32-bit unsigned integer.
  3. The value of the object (i.e., the specified amount of raw bytes).

Strings are encoded as follows:

  1. The size of the string in bytes.
  2. The given amount of bytes, each representing an ASCII character.

Defined Versions

As apparent from the prior description, different versions of the format exist:

  • Version 1: This was probably the original version of the format.
  • Version 2: Added the symArgvs and symArgvLen fields to the binary format.
  • Version 3: Changed the format’s header from BOUT\n to KTest (see commit 489f3d4).

The initial check-in of the KLEE source code already used version 2 of the format.

Appendix: Formal Description

The quebex symbolic execution engine provides a Haskell implementation of the format in Data.KTest.

The following YAML formally describes the format using Kaitai Struct:

Description of the KTest format using Kaitai Struct.
meta:
  id: ktest
  file-extension: ktest
  license: Unlicense
  ks-version: 0.9
  endian: be
seq:
  - id: header
    type: header
  - id: arguments
    type: arguments
  - id: sym_argvs
    type: u4
    if: header.version >= 2
  - id: sym_argvs_len
    type: u4
    if: header.version >= 2
  - id: objects
    type: objects
types:
  header:
    seq:
      - id: magic
        # TODO: Also support 'BOUT\n` (see above).
        contents: 'KTEST'
      - id: version
        type: u4
        valid:
          min: 0
          max: 3
  arguments:
    seq:
      - id: num_args
        type: u4
      - id: value
        type: str_with_len
        repeat: expr
        repeat-expr: num_args
  objects:
    seq:
      - id: amount
        type: u4
      - id: value
        type: object
        repeat: expr
        repeat-expr: amount
  object:
    seq:
      - id: name
        type: str_with_len
      - id: bytes
        type: bytes_with_len
  bytes_with_len:
    seq:
      - id: len
        type: u4
      - id: value
        size: len
  str_with_len:
    seq:
      - id: len
        type: u4
      - id: value
        type: str
        encoding: ASCII
        size: len