Mixed flow graphs are very likely not my idea -- but this was the theory I used while defining my programs. I wouldn't know what to call it otherwise, so I came up with a way to express this concept for my own applications.
Suppose you have the following function:
int diamond(int x, int y) {
int c = x + y; // 1
if (c == 0) {
return c; // 2
} else if (x == 5) {
c += 15; // 3
} else if (y == 5) {
c += 32; // 4
} else {
c += 6; // 5
}
return c; // 6
}
You'll end up with the following control flow:
And data flow:
Now, suppose that I wanted to ask questions like, "what value is in the return when x is 5 and y is -5?" With either of the graphs above, you wouldn't be able to tell me -- one just tells you where you can go, and the other tells you what values can exist. To ask questions about the results of a program, expressed as a path, you would need to be able to say which values do exist based on where you did go.
To express this, I propose a mixed-flow graph. To do this, I use custom nodes, which I call "psi nodes", with the following inputs/outputs:
- Single data input, indicated by a solid edge
- Multiple boolean inputs, indicated by a dashed edge
- Multiple data output, indicated by a thin solid edge
- Multiple boolean outputs, indicated by a dashed edge
The psi node here has the following behaviour:
- If all the boolean inputs are true (psi node is "active")
- the data outputs are set the value present in the data input.
- the boolean outputs are set to true
- If not all the boolean inputs are true (psi node is not "active")
- the data outputs are set to undefined and nodes which use psi node outputs as inputs must use a different psi node's output
- the boolean outputs are set to false
By implementing the psi node in this fashion, we can emulate control flow nods with chains of psi nodes. Moreover, we can more specifically state which value a phi-node will result in.
As the function above gets very difficult to draw very fast, consider the following (simpler) function:
int simple_conditional(int x) {
if (x != 0) {
return 1;
} else {
return 0;
}
}
We can express a mixed-flow graph for this function:
Now, we can make assertions about data, e.g. "x is equal to 0", and determine that the return of the function is indeed 0 as psi node 2 is active and psi node 1 is not (as psi node 2's boolean input is inverted to true when psi node 1 is not active).
For a more complex example, consider a nested if statement:
int complex_conditional(int x) {
int c = x + 10;
if (x != 5) {
c += 10;
if (c > 30) {
c = 30;
}
} else {
c -= 1;
}
return c;
}
To handle this, we need two other considerations:
- A psi node can represent a nested block by checking if the surrounding block was hit, i.e.: the previous psi node was hit
- An if statement which modifies a variable defined by a surrounding block with no else statement can be represented as an if-then-else by making the surrounding block's definition an else
For example, we could consider:
c += 10;
if (c > 30) {
c = 30;
}
As:
c += 10;
if (c > 30) {
c1 = 30;
} else {
c1 = c; // redundant, but explicit
}
c = c1;
With these replacements in mind, a mixed flow graph for complex_conditional
could be drawn as:
I apologise for the complexity of this graph, but it gets to be a lot of nodes very fast!
Take a moment to verify this for yourself by tracing through the various possibilities. You'll find that it neatly
defines complex_conditional
for various inputs of x
.
The specific organisation of this graph makes it possible to quite easily represent functions as theorems. Lovely for us, as you can see by my discussion on MFG to SMT conversion.