Recently, I was made aware of a paper in which I was acknowledged. This came as quite a surprise; I had only spoken with this group as a single email following a post on secret.club. In essence, Leo (i.e., Ao Li from the paper) reached out as we had quite the topic collision -- the effect of mutating bytes which are then interpreted as structured data.

In the secret.club post in mid-2024, I detailed the effect of using mutations on bytes from which structure is derived. The effect that I observed is that the mutation distance by structure was often very large when compared to the mutation distance by bytes. In other words: a small byte mutation leads to a large mutation of how the data is interpreted by the program under test. There, I referred to this problem as the "data reinterpretation problem", wherein a byte mutation affects some early subsequence that causes the subsequent bytes to be interpreted differently. I was certainly not the first to call out this problem; the registered report version of the paper arrived several months before my post and deemed this the "havoc effect". In my case, I was referring both to when structure was derived (i.e., a structure "parsed" from a random byte sequence) and when the bytes themselves were parsed by the target, but the effect is observed in both cases. Though we both discussed this problem sometime in the middle of last year, this is an issue which has plagued fuzzing for some time.

Caroline Lemieux, one of the authors on the paper, noted this back in SBFT'23 -- though specifically in the context of deriving data from random bytes, like in the original Zest work. This was the first time I had seen someone specifically call out this effect; I had previously encountered it with arbitrary [1] [2] back in 2022 and was very happy to see it discussed. Arbitrary itself was released a bit before the original Zest paper (2018 vs 2019), but the havoc effect was already observed in practice shortly after. Going a bit further back and broadening our scope from fuzzing, I think we could easily argue that the havoc effect was something used by design in certain search algorithms in the early 2000s. But to quantify this effect is certainly new, and exceptionally valuable. I think I'll do some experiments to measure the havoc effect on other grammars using arbitrary and a grammar fuzzer I've been working on for a while.

I'm really happy to both see this paper and be acknowledged in it. I'm pleasantly surprised to hear that my single email was enough to be acknowledged, and I'm also very happy to see that my other post about fuzzing was also of value to the work. I think the paper is excellent and opens up several new cans of worms for folks doing fuzzing.

Where the paper only deals with mutation of the random sequences from which the inputs are derived, there is clearly a similar effect when mutating an input directly. Can we quantify the effect on how differently an input is interpreted by a program directly? Perhaps we can use this to improve our ability not only to make our generators better, but also to improve our grammar mining ability -- and utilize the havoc effect itself as an exploration strategy.

Similarly, can we identify when the havoc effect occurs and then correct the input afterward? The havoc effect often occurs because early sequences are reinterpreted, leading to a change in how the remainder of the input is interpreted. As a result, even if we end up in a new code region, the reinterpreted segment of the input may no longer be well-formed. Perhaps we can use this knowledge to guide explore/exploit search strategies to only apply mutations that will lead to the havoc effect when exploring, and then "correct" the reinterpreted parts of the input by mutation during exploitation.

Regardless of how it ends up being used, this is a clear win and will very likely lead to new research on how to fuzz programs better.