This was a project I did during my undergrad when I was still learning about where I wanted to go with my security research.
As with many projects, this was not completed meaningfully. The text below is my hopeful, naïve self expecting to solve a hard formal analysis and reverse engineering problem. The subsequent parts of the project are implemented in US-CERT's GhiHorn.
Looking back on it, I did some pretty cool things in this project. For the courageous, consider viewing the test code for the project as the project was able to extract and prove some interesting program properties, such as program equivalence and basic path constraint solving.
Full content from my original website may be recreated using an older version of the repository for this website. The original description from my old website is provided below.
Schadenfreude is a pet project that I've been working on for quite some time now. While starting out as a personal project, it has evolved into a project which I develop in my higher-level courses at TAMU.
Iteration one of Schadenfreude has two components:
- a buffer fixup analyser, which detects buffers present in the stack and retypes variables appropriately
- a stack smash analyser, which detects buffer overflows present in the stack
Unfortunately, it is quite limited -- there's no form of symbolic execution or automated way to apply operations to derive values (symbolic or concrete).
Iteration two, "Schadenfreude: Resurrection", attempts to approach the failures of iteration one by applying a theorem prover to PCode operations to convert functions into theorems by combining control flow and data flow graphs into a single "mixed flow" graph. Using this, one can effectively "ask" the theorem prover to prove that a specific property is true, or if not, why it is not. This is the iteration described in a majority of this section.
Similar to iteration one, iteration two has major failures:
- there is no handling of loops
- memory is not represented well and needs revisiting
- the code is just not that good lol
Here's the initial paper submitted regarding the theory and implementation of this iteration, though I warn you: it is quite cringe and may have incorrect information as I have had no formal education in program verification.
Iteration three, which is neither named nor yet exists, will attempt to resolve some failures of iteration two. Primarily:
- loop extraction (discover all cycles in the mixed flow graph, perform loop analysis/quantification/unrolling)
- loop analysis ("can we deconstruct this loop into a known pattern based on the variants present in the loop?")
- loop quantification ("can we represent loop-changed variables as a result of quantifiers/lambdas?")
- loop unrolling (if loop analysis and quantification fails, manually unroll the loop to a fixed depth)
- segmented memory representation (mappings will be segmented such that the theorem prover doesn't just assume they overlap)
- code review (with peers!)
Some theory for iteration three will be discussed, but will be marked explicitly as part of iteration three, where applicable.
Finally, iteration four will take on more implementation-specific components: syscalls and vulnerability classification. With vulnerabilities classified, the SMT solver should be able to detect vulnerabilities and generate payloads to exploit them automatically. A majority of the work of this iteration will lie in correctly representing syscall behaviour and defining vulnerabilities in the context of a binary. Without knowing how iteration three will be implemented, iteration four's implementation would be pure speculation. I'll define the "how" part later.