Last week I mentioned a portability problem with variadic functions. Today’s topic is similar.
In late 2005 I transitioned ESEA from AMX Mod to AMX Mod X. We were only using it for a CSDM server. The server ran in 64-bit mode, so I installed 64-bit builds of AMX Mod X and CSDM, verified that they were running, and considered the job done.
Soon reports came in from users that the server wasn’t working – the gun menus were simply dying out instead of giving weapons. This bit of code in CSDM was failing (simplified):
Select All Code:public OnMenuSelected(client, item)
{
if (item == -1)
{
/* Do something */
}
}
After hours of debugging, the problem became known (I believe it was PM who discovered it). To explain the problem, let’s take a look at what’s involved. AMX Mod X plugins use a data type for integers called a “cell.” Cells have a small catch over normal integers:
Select All Code:#if defined __x86_64__
typedef int64_t cell;
#else
typedef int32_t cell;
#endif
It is 32-bit on 32-bit systems, and 64-bit on 64-bit systems. That’s unusual because on AMD64, an integer is 32-bit by default. The cell’s weird behaviour was a necessary but awkward idiosyncrasy resulting from some legacy code restrictions.
AMX Mod X relied on a single function for running stuff in plugins. This function’s job was to eat up parameters as cells, using va_arg, and to pass them to a plugin. For demonstration purposes, it looked like:
Select All Code:int RunPluginFunction(const char *name, ...);
CSDM’s failing function was getting invoked like this:
Select All Code:RunPluginFunction("OnMenuSelected", client, -1);
Now, let’s construct a sample program which demonstrates how this idea can break:
Select All Code:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| #include <stdio.h>
#include <stdint.h>
#include <stdarg.h>
#if defined __x86_64__
typedef int64_t cell;
#else
typedef int32_t cell;
#endif
void print_cells(int dummy, ...)
{
cell val;
va_list ap;
va_start(ap, dummy);
val = va_arg(ap, cell);
printf("Test: %016Lx\n", val);
va_end(ap);
}
int main()
{
cell val = -1;
print_cells(1, 1);
print_cells(1, val);
print_cells(1, -1);
return 0;
} |
This program has a small variadic routine which reads in a number as a cell and prints it. Our tests print 1, -1, and -1. Here’s what it outputs on AMD64:
Test: 0000000000000001
Test: ffffffffffffffff
Test: 00000000ffffffff
The first case looks good, but what’s up with the other two? We passed -1 in both times, but it came out differently! The reason is simple and I alluded to it earlier: AMD64 treats numbers as 32-bit by default, and thus that hardcoded -1 was 32-bit. The higher bits didn’t get used, but they’re there anyway because internally everything is stored in 64-bit chunks (registers are 64-bit and thus items on the stack tend to be 64-bit just to make things easy).
If you were to take that raw 64-bit data and interpret it as a 32-bit integer, it would read as -1. But as a 64-bit integer (or a cell), because of two’s complements, it’s not even negative! Of course, va_arg doesn’t know that we passed 32-bit data. It simply reads what it sees off the stack/register.
So what happened is that the plugin got a “chopped” value, and the comparison of 0xffffffffffffffff (64-bit -1) to 0x00000000ffffffff (32-bit -1 with some garbage) failed. As a fix, we went through every single instance of such a call that could have negative numbers, and manually casted each afflicted parameter to a 64-bit type.
The lesson? Avoid variadic functions as API calls unless you’re doing formatting routines. Otherwise you’ll find yourself documenting all of the resulting oddities on various platforms.