Any of the Pcode operations discussed in this section can be found in the Pcode reference manual.

The mixed flow graphs discussed in the theory section are a bit optimistic for actually applying them to binary analysis. They are purely symbolic and require manual analysis to construct. In order to use them in actual application, we need to convert Pcode and PcodeBlocks into SMT statements using MFGs as a basis for the conversion.

To do so, we first decompile the functions we wish to target. We can consider each of the blocks of decompiled Pcode as a single node in a control flow graph, with each operation therein as a single statement in SMT.

By fetching the def for a given varnode use, we can define non-constant varnode inputs as the results of existing operations. This acts as the data edges in our MFG, and the Pcode operations themselves act as the nodes. Thus, we can consider the following sequence of Pcode operations:

v1 = COPY 5:8
v2 = COPY 3:8
v3 = INT_ADD v1 v2

As the following MFG (no psi nodes because we don't need them yet):

The mixed flow graph for the Pcode above

Which then can be converted into the following series of Z3 statements (vis-à-vis the initial discussion on converting MFGs to SMT:

(declare-fun v1 () (_ BitVec 64))
(assert (= v1 #x0000000000000005))
(declare-fun v2 () (_ BitVec 64))
(assert (= v2 #x0000000000000003))
(declare-fun v3 () (_ BitVec 64))
(assert (= v3 (bvadd v1 v2)))

(notice in particular that it's quite simple to convert Pcode to z3)

The case with the most need for care is that of MULTIEQUAL. Due to a coincidence in how the decompiler generates these MULTIEQUAL Pcodes, they are in the order of most branches followed. This quirk is important for our purposes. Consider the existing example from the initial discussion on MFGs:

The mixed flow graph for the complex conditional example

Recall the replacement we made to allow for if without else constructions; with the order of "most branches followed", we can actually drop this requirement and instead represent MULTIEQUAL as a chained if-than-else construction over the inputs. This even works over situations of multiple nested ifs, e.g.:

c += 10; // 1
if (c > 30) {
    c /= 3; // 2
    if (c > 15) {
        c /= 5; // 3
    } else if (c < 15) {
        c *= 5; // 4

Wherein we can say, effectively:

This is important because, by definition, if 3 was hit, 2 was hit as it is nested, but if 3 is hit, we want to use the value from 3 and not the value from 2. Thus, MULTIEQUAL allows us to represent nested if blocks as chains of psi nodes with boolean inputs of "block x hit" and "block y not hit."

Additionally, we can state whether a given branch was followed on whether its condition passes. For Ghidra, this is quite simple to represent: the CBRANCH operation takes a falsy branch if its input is 0, and truthy otherwise. The edges that will be followed as a result of these can be extracted using the designated functions in PcodeBlock.

Finally, we can say that a given block will be hit if any of the in edges are true (and, in DAGs, only one of the in edges!). Thus, we can define the boolean inputs of our psi nodes discussed for MULTIEQUALs using the block hit condition, which is simply the result of an or of the edge followed conditions.

With our MFG constructed and our Pcode operations converted to SMT statements, we're ready to reconstruct our function as collection of SMT assertions.

Consider the simple_conditional example from the initial discussion on MFGs:

int simple_conditional(int x) {
    // 1
    if (x != 0) {
        return 1; // 2
    } else {
        return 0; // 3

Supposing that the Ghidra decompilation recovers reasonable Pcode for this function, we should be able to convert it to an appropriate MFG (multiple boolean inputs are treated as an "and" unless another operator is specified, e.g. "or"):

The mixed flow graph for the simple conditional example

Notice in particular that our psi nodes here are not the block conditions themselves, but instead psi nodes which derive their constraints from block hit conditions. Notice also that, as opposed to the graph constructed originally, we require that block one, which contains blocks two and three, must be hit and that both the edge hit condition and the surrounding/previous block must be hit for blocks two and three to be hit.

In this manner, our MFG represents both data flow and control flow. Additionally, this can be trivially converted into SMT assertions and assertions can be made about both blocks and data!

(declare-fun x () (_ BitVec 32))
(declare-fun neq-1 () (_ BitVec 8))
(declare-fun blk-1 () Bool)
(declare-fun edge-1-2 () Bool)
(declare-fun edge-1-3 () Bool)
(declare-fun blk-2 () Bool)
(declare-fun blk-3 () Bool)
(declare-fun vn1 () (_ BitVec 32))
(declare-fun vn2 () (_ BitVec 32))
(declare-fun ret () (_ BitVec 32))
(assert blk-1)
(assert (= neq-1 (ite (= x #x00000000) #x00 #x01)))
(assert (= edge-1-2 (and blk-1 (or (not (= neq-1 #x00))))))
(assert (=> edge-1-2 blk-2))
(assert (= edge-1-3 (and blk-1 (not blk-2))))
(assert (=> edge-1-3 blk-3))
(assert (= vn1 #x00000001))
(assert (= vn2 #x00000000))
(assert (= ret (ite blk-3 vn2 vn1)))
(assert ((_ at-most 1) edge-1-2 edge-1-3))

You can verify that this simulates simple_conditional for yourself on z3's website. Please do let me know if you find any problems, as I wrote this one by hand and it is not as verified as an autogenerated one.

This is the general process by which Schadenfreude converts Pcode to SMT, with a few additional considerations that I didn't feel were necessary to dive into in this post alone.