Recently, Raymond Chen wrote about Programming by Accident. Similarly, I often find myself responsible for bugs whereby functionality is impaired, but it undetectably works by accident.
A simple example of this is, which I have done on more than one occasion, is:
/** * @param pInt If non-NULL, is filled with the number 5. */ void GetFive(int *pInt) { if (*pInt) /* Bug: Should just be pInt*/ { *pInt = 5; } }
Since it allows NULL, it’s a crash bug. However, let’s say you’re using it like this:
int five; GetFive(&five)
On Windows, Microsoft’s Debug Runtime fills uninitialized data with 0xCC. Our variable will always be filled despite the bug because its contents is non-zero. Thus in debug mode, we’ll never see this bug (unless using something like Rational Purify or valgrind), and even in release mode there’s a good chance that the uninitialized contents will still be non-zero.
This is an example of where crashing is good. You’d instantly see the problem as opposed to potentially tracing through many nonsensical side effects. This has happed in SourceMod twice to date. The first time, SQL results were not being fetched right. The second time, admins were not loading. In both cases, the code worked fine most of the time, but broke on a few machines that just had bad luck.
The worst case of this came a few weeks ago in AMX Mod X with the following detour. The purpose was to save the EAX register, then pass the first parent parameter on the stack:
;start of detour/function push eax ; save eax push dword 0 ; result code push [esp+4] ; pass first parameter on the stack call X
The bug here is that esp+4 is dreadfully wrong. Instead of reaching up into the parameter list (which would be at esp+12), it’s reaching up into our own local frame! We’ve just pushed eax again.
Unfortunately, this detour worked fine for all of the testing I did with AMX Mod X (and it was no small amount). Then when we distributed betas, the crash reports came in. What was going wrong? Well, the detour was finally getting invalid parameters. It took a good ninety minutes to catch the typo, but more interesting is why it worked at all.
The function being detoured was ClientCommand(). Using IDA’s “Find References” feature, I looked at the disassembly to every call to ClientCommand() internal to the game. It looked like this:
This happened on both GCC and MSVC8 for most calls to ClientCommand(). The compiler used eax as a temporary register for pushing the first parameter, so the unnecessary saving of eax by the detour ended up allowing a subtle bug to continue working!
Unfortunately, there’s no easy way to identify stupid bugs like this — you have to suffer through each one unless you have a knack for spotting typos.
oh that SQL bug looks familiar to me ;)
AMX bug you say, you wouldn’t happen to be referring to the auto buy bug? Anyway, great article.