Communication in testing topics is hard. Communicating about fuzzing in particular is hard, as it's often seen as a magical way to solve a variety of problems in testing -- it's very simple to develop, it's automated, and it has a good track record of finding bugs. But often, fuzzing can't find bugs, and when they can't, it's often very hard to explain why.

I've been preparing recently for a talk I will give at the end of the month. For this, I've been considering a variety of ways to explain the limitations of fuzzing. While watching some pannenkoek, it occurred to me: why not talk about video games?

"you wouldn't download a car" graphic, but with the text replaced by "you wouldn't fuzz a video game"

Video games and fuzzing

Fuzzing video games is not a new idea. Testing strategies for "solving" video games in general isn't new either1. But these strategies often require domain-specific knowledge about the game -- or very simple games -- to be effective.

Imagine, for a second, that we didn't give domain-specific knowledge. It's certainly unreasonable to suggest that, say, by just measuring code coverage, that one could beat Super Mario 64 with a fuzzer. You might get some interesting input coverage, or maybe rendering routines, sounds, etc., but you would quickly saturate without actually even getting to the front door of the castle.

That's no way to save the princess!

Software and fuzzing

In general programs, especially those with lots of state, the idea of a video game maps quite easily onto them. There are many "places" you need to go with certain prerequisites and it's not always obvious, at least as derived from program data dynamically, where you need to go (e.g. to find bugs). In the same way that it's unreasonable to suggest that a fuzzer could "solve" a video game, it's probably just as insane to suggest that a fuzzer can find a bug in any program from scratch -- at least without some sort of guidance mechanism.

Guidance?

In video games, we generally know where the goal is, and can inform whatever solver we're using on how close it is to the goal by distance or by requirements. This can, and almost certainly will, lead to problems with local optima (see the links above), but generally it's a pretty effective way to guide testing strategies to "solve" video games.

But what if we don't know where the goal is, exactly?

For general software, conditions which prevent accessing new code regions are discrete, with arbitrarily complex constraints. Similarly, we don't necessarily know the preconditions for certain bugs triggering -- nor where those bugs are located in code -- ahead of time. As a result, there is no effective guidance towards bugs.

Can we more concretely express bugs that we look for, in order to derive the conditions? Can we guide towards the fulfillment of certain conditions?

Inputs?

In video games, the inputs are simple and largely don't affect the state too much. At most, it applies some vector to your position.

Programs, on the other hand, are not so smooth over inputs -- that is to say, minor adjustments in the input can lead to major differences in outcome. Thus, even if we had meaningful guidance, it's not clear that our existing mutations would be suited for following said guidance.

Can we derive better ways to mutate our inputs along guidance provided?

Conclusion

This is again a bit of a rant without so much of a conclusion, just thoughts about how we can try to do this better. Maybe it'll give someone an idea about how to approach these problems better.

If you do, send me a message when you publish. :)


Corrigenda

In a subsequent conversation on Mastodon, Louis Merlin pointed out that it is somewhat misleading to suggest that input doesn't affect state too much. It would be more accurate to say that input generally is semantically similar locally but not in effect. That is to say, "jump" might mean different things in different contexts ("crouch jump" vs "long jump" vs "normal jump") but they are generally the same effect.

In the long run, these inputs might have huge effects on the final state (i.e. where you go in the game, whether you survive an encounter with an enemy, etc.), but they are interpreted similarly at the time at which they are executed. You could think of these inputs more like packets or instructions: always interpreted the same, but with wildly different effects in the long run. Fuzzers which mutate without packet-level detail (i.e. "flat buffer" fuzzers) may not understand the underlying input format and subtle changes to the input lead to wildly different interpretations of the input.

At the same time, not all fuzzers are the same. Maybe you have a fuzzer which is capable of preserving input semantics over mutation, in which case the correlation between inputs is much higher -- that is, locally similar, but having major state effects in the long run.

This suggests that video games are comparable (over fuzzing) to stateful systems like databases, interpreters, CPUs, etc., where input segments correspond to instructions which affect the state.

This is a very nice way of demonstrating that the input semantics are indeed very relevant for how effective a fuzzer might be!

1

Yes, RL is a general strategy, but it shows up somewhat frequently in search-based testing.