Category Archives: Articles

“Official” articles that were pre-written for posting.

Design Decision I Most Regret

I previously mentioned that I’m a stickler for good platform code. Of all the design decisions I’ve ever made, there is one that I regret above all others. To this day, it haunts me mercilessly. It has appeared over and over again in countless lines of code for Half-Life 1, and silently passed to a few Half-Life 2 plugins. What is it?

The [g|s]_et_pdata functions in AMX Mod X allow you to reach into a CBaseEntity’s private data (i.e., its member variables) and pull out values given an offset from the this pointer. What’s the problem? Let’s take a look at how each of the retrieval mechanisms work:

4-byte retrievals are given by a 4-byte aligned offset ((int *)base + offset)
Float retrievals are offset by a float aligned offset ((float *)base + offset)

At the time, I probably didn’t understand how addressing or memory really worked. I was also basing the API off years of code that already made these mistakes — learning from an incorrect source doesn’t help. So, what’s the actual problem?

In C/C++, given pointer X and offset Y, the final address is computed as X + Y * sizeof(X). That means when we do (int)base, the actual offset becomes offset * sizeof(int), or 4*offset. Not only have we lost byte addressability, but to get unaligned values we have to do various bitwise operations. This type of deep-rooted mistake is easily visible in the following code in the cstrike module:

Select All Code:

if ((int)*((int *)pPlayer->pvPrivateData + OFFSET_SHIELD) & HAS_SHIELD)
	return 1;

It turns out HAS_SHIELD is a bitmask to clear unwanted data, when in reality, the offset is wrong and (int *) should be (bool *).

Alas, backwards compatibility prevents us from ever making these API changes, and changing internal usage would then confuse users further.

In summary, the practice of casting to random types is ridiculous and SourceMod’s API summarily kills it off. The entire practice can now be expressed in a function like this:

Select All Code:

template <typename T>
void get_data(void *base, size_t offset, T * buffer)
{
   *buffer = *(T *)((char *)base + offset);
}

Pointer Invalidation, or When Caching is Dangerous

One of the most tricky things about C++, as opposed to a garbage-collected language, is the lifetime of a pointer (or the memory it points to). The standard cases are the ones beginners learn pretty quickly:

“Forever”: Static storage means global lifetime
Short: Local storage means until the end of the current scope block
Very short: Objects can be temporarily created and destroyed from passing, return values, or operator overloads.
Undefined: Manually allocated and destroyed memory

A more subtle type of pointer invalidation is when storing objects in a parent structure. For example, observe the following code:

Select All Code:

    std::vector<int> v;
    v.push_back(1);
 
    int &x0 = v[0];
 
    printf("%d\n", x0);
 
    v.push_back(2);
 
    printf("%d\n", x0);

Running this little code block on gcc-4.1, I get:

Select All Code:

[dvander@game020 ~]$ ./test
1
0

What happened here? I took an address (references, of course, are just addresses) to a privately managed and stored object (in this case, the object is an integer). When I changed the structure by adding a new number, it allocated new memory internally and the old pointer suddenly became invalid. It still points to readable memory, but the contents are wrong; in a luckier situation it would have crashed.

What’s not always obvious about this type of mistake is that the vector is not being explicitly reorganized, as would be the case with a call like clear(). Instead, it’s important to recognize that if you’re going to cache a local address to something, you need to fully understand how long that pointer is guaranteed to last (if at all).

This type of error can occur when aggressively optimizing. Imagine if the previous example used string instead of int — suddenly, storing a local copy of the string on the stack involves an internal new[] call, since string needs dynamic memory. To avoid unnecessary allocation, a reference or pointer can be used instead. But if you’re not careful, a simple change will render your cached object unusable — you must update the cached pointers after changes.

There was a was very typical error in SourceMod that demonstrated this mistake. SourceMod has a caching system for lump memory allocations. You allocate memory in the cache and the cache returns an index. The index can give you back a temporary pointer. Observe the following pseudo-code:

Select All Code:

int index = cache->allocate(sizeof(X));
X *x = (X *)cache->get_pointer(index);
//...code...
x->y_index = cache->allocate(sizeof(Y));

Can you spot the bug? By the time allocate() returns, the pointer to x might already be invalidated. The code must be:

Select All Code:

int index = cache->allocate(sizeof(X));
X *x = (X *)cache->get_pointer(index);
//...code...
int y_index = cache->allocate(sizeof(Y));
x = (X *)cache->get_pointer(index);
x->y_index = y_index

There are many ways to create subtle crash bugs by not updating cached pointers, and they often go undetected if the underlying allocation doesn’t trigger address changes very often. Worse yet, whether that happens or not is often very dependent on the hardware or OS configuration, and serious corruption or crash bugs may live happily undiscovered for long periods of time.

The moral of this story is: Be careful when keeping pointers to where you don’t directly control the memory. Even if you’ve written the underlying data structure, make sure you remember exactly what the pointer lifetime is guaranteed to be.

The Hell that MOVAPS Hath Wrought

One of the most difficult bugs we tracked down in SourceMod was a seemingly random crash bug. It occurred quite often in CS:S DM and GunGame:SM, but only on Linux. The crash usually looked like this, although the exact callstack and final function varied:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1209907520 (LWP 5436)]
0xb763ed87 in CPhysicsTrace::SweepBoxIVP () from bin/vphysics_i486.so
(gdb) bt
#0  0xb763ed87 in CPhysicsTrace::SweepBoxIVP () from bin/vphysics_i486.so
#1  0xb7214329 in CEngineTrace::ClipRayToVPhysics () from bin/engine_i686.so
#2  0xb7214aad in CEngineTrace::ClipRayToCollideable () from bin/engine_i686.so
#3  0xb72156cc in CEngineTrace::TraceRay () from bin/engine_i686.so

This crash occurred quite often in normal plugins as well. Finally, one day we were able to reproduce it by calling TraceRay() directly. However, it would only crash from a plugin. The exact same code worked fine if the callstack was C/C++. But as soon as the call emanated from the SourcePawn JIT, it crashed. Something extremely subtle was going wrong in the JIT.

After scratching our heads for a while, we decided to disassemble the function in question — CPhysicsTrace::SweepBoxIVP(). Here is the relevant crash area, with the arrow pointing toward the crashed EIP:

   0xb7667d7c :        mov    DWORD PTR [esp+8],edi
   0xb7667d80 :        lea    edi,[esp+0x260]
-> 0xb7667d87 :        movaps XMMWORD PTR [esp+48],xmm0
   0xb7667d8c :        mov    DWORD PTR [esp+0x244],edx

We generated a quick crash and checked ESP in case the stack was corrupted. It wasn’t, and the memory was both readable and writable. So what does the NASM manual say about MOVAPS?

When the source or destination operand is a memory location, it must be aligned on a 16-byte boundary. To move data in and out of memory locations that are not known to be on 16-byte boundaries, use the MOVUPS instruction.

Aha! GCC decided that it was safe to optimize MOVUPS to MOVAPS because it knows all of its functions will be aligned to 16-byte boundaries. This is a good example of where whole-program optimization doesn’t take external libraries into account. I don’t know how GCC determines when to make this optimization, but for all intents and purposes, it’s reasonable.

The SourcePawn JIT, of course, was making no effort to keep the stack 16-byte aligned for GCC. That’s mostly because the JIT is a 1:1 translation of the compiler’s opcodes, which are processor-independent. As a fix, faluco changed the JIT to align the stack before handing control to external functions.

Suddenly, an entire class of bugs disappeared from SourceMod forever. It was a nice feeling, but at least a week of effort was put into tracking it down. The moral of this story is that source-level debugging for “impossible crashes” is usually in vain.

Writing Good Platform Code

One thing I’m a real stickler about is what I call writing good platform code. When I say platform code, I am referring to code that shows an fuller understanding of both the platform it’s being used on and its ability to be transferred to other platforms.

Windows coders are often especially bad at this because, and this is partially Microsoft’s fault, years of seeing sample code that does things like:

Select All Code:

T * some_function(void *base, size_t offset)
{
   return (T *)((DWORD)base + offset);
}

This code makes the dangerous assumption that the pointer will always fit inside a 32-bit integer. Nowadays Visual Studio tries to throw warnings about this type of mistake, and Microsoft introduced INT_PTR and UINT_PTR as replacements for casting pointers to DWORD. Yet, this type of mistake continues to persist.

However, I argue that using Microsoft-specific types is a bad idea in its own regard. Back in the Windows 3.1 days, Microsoft designed all their structures and API around types where their own names were hardcoded to the specific platform (that is, 16-bit x86). For example, WPARAM (originally 16-bit) and LPARAM (originally 32-bit) are both 32-bit on both x86 and AMD64.

Another example of this is the long datatype. Historically, this type was supposed to be the longest integer representable by a register (I think). For example, on 16-bit Windows, it was 32-bit, while a normal integer was 16-bit (I may be wrong on this). On 64-bit architectures, long has typically been 64-bit while an integer is 32-bit. Using long, since it varies so wildly, is therefore generally a very bad idea unless there is a specific need for it.

Unfortunately, Microsoft decided to use long in many various structures and API calls, and ran into the problem that, on Win64, making it 64-bit would break much legacy code. So, the 64-bit Visual C compiler makes long a 32-bit type. This makes long an even worse type to use since its width of a register assumption is changed on Win64.

I have seen countless programs that ignore good typing practices and make themselves unportable (for no reason other than poor style) to other platforms. For the best portability, be proactive:

Use *int_Xt, for example, int8_t instead of char or BYTE when your intention is to store an 8-bit integer.
Use intptr_t for pointer math that requires using an integer.
Use size_t for counts that can’t go below 0. Don’t use int or DWORD when you mean something that might not be a signed 32-bit integer.
Avoid Microsoft typedefs unless you’re directly interacting with Windows API. They have no place in code that might run on other platforms.
Avoid long unless it’s part of an external API call.
Avoid time_t in binary formats (such as files or network code). On GCC/VS2k3, its size is equal to the processor bit width. On VS2k5, it is always 64-bit.

Examples of these mistakes:

long: A developer once gave me code to interact with his network application. The code compiled on my 64-bit server but failed to connect with his application. He had used long in a network structure assuming it would be 32-bit. long was doubly meaningless as it could be different on either end of the spectrum. Using int32_t in the specification would have solved that.

time_t: Someone in IRC asked me to make a reader for his binary file format, which involved a time_t field in a struct. On my 32-bit VS2k5 compiler, time_t was 64-bit, and I couldn’t read his file format at first. The specification should have either mentioned the compiler, mentioned that time_t had to be 32-bit, or it should have simply used a field guaranteed to be most portable (like int64_t).

All of these are general suggestions toward making applications more portable. As long as you know the exact meaning of your types, and you have fully documented the “indeterminate” cases in your API calls/structures/binary formats, it’s fine. But if you’re blindly using sketchy types out of laziness, you won’t regret taking the time to use better types. You gain both readability and portability by being explicit.

It Works by Accident

Recently, Raymond Chen wrote about Programming by Accident. Similarly, I often find myself responsible for bugs whereby functionality is impaired, but it undetectably works by accident.

A simple example of this is, which I have done on more than one occasion, is:

Select All Code:

/**
 * @param pInt      If non-NULL, is filled with the number 5.
 */
void GetFive(int *pInt)
{
   if (*pInt)   /* Bug: Should just be pInt*/
   {
      *pInt = 5;
   }
}

Since it allows NULL, it’s a crash bug. However, let’s say you’re using it like this:

int five;
GetFive(&five)

On Windows, Microsoft’s Debug Runtime fills uninitialized data with 0xCC. Our variable will always be filled despite the bug because its contents is non-zero. Thus in debug mode, we’ll never see this bug (unless using something like Rational Purify or valgrind), and even in release mode there’s a good chance that the uninitialized contents will still be non-zero.

This is an example of where crashing is good. You’d instantly see the problem as opposed to potentially tracing through many nonsensical side effects. This has happed in SourceMod twice to date. The first time, SQL results were not being fetched right. The second time, admins were not loading. In both cases, the code worked fine most of the time, but broke on a few machines that just had bad luck.

The worst case of this came a few weeks ago in AMX Mod X with the following detour. The purpose was to save the EAX register, then pass the first parent parameter on the stack:

Select All Code:

;start of detour/function
push eax	; save eax
push dword 0	; result code
push [esp+4]	; pass first parameter on the stack
call X

The bug here is that esp+4 is dreadfully wrong. Instead of reaching up into the parameter list (which would be at esp+12), it’s reaching up into our own local frame! We’ve just pushed eax again.

Unfortunately, this detour worked fine for all of the testing I did with AMX Mod X (and it was no small amount). Then when we distributed betas, the crash reports came in. What was going wrong? Well, the detour was finally getting invalid parameters. It took a good ninety minutes to catch the typo, but more interesting is why it worked at all.

The function being detoured was ClientCommand(). Using IDA’s “Find References” feature, I looked at the disassembly to every call to ClientCommand() internal to the game. It looked like this:

Select All Code:

mov eax, <something>
push eax
call ClientCommand

This happened on both GCC and MSVC8 for most calls to ClientCommand(). The compiler used eax as a temporary register for pushing the first parameter, so the unnecessary saving of eax by the detour ended up allowing a subtle bug to continue working!

Unfortunately, there’s no easy way to identify stupid bugs like this — you have to suffer through each one unless you have a knack for spotting typos.

Linux and C++ ABI, Part 2

Last week I briefly mentioned the nightmares behind Linux and GNU Standard C++ ABI compatibility. This manifested itself at a very bad time — at ESEA, we were getting ready to launch our European servers. Unlike their USA counterparts, two of the physical machines were 32-bit. I made 32-bit builds of our software and uploaded them to both machines. I checked one box and everything was working. Then we launched.

Guess what? The machine I didn’t check? Nothing was working. A critical Metamod plugin was failing to load with the message:

libstdc++.so.6: cannot handle TLS data

I had seen this before, and quickly panicked; in past cases I’d never managed to fix it. It is virtually undocumented (try a Google search), and the problem is occurring in the ELF loader, not exactly a trivial area of Linux to start dissecting at product launch.

The first thing I did was download the libc source code for the OS – 2.3.4. A grep for “handle TLS data” revealed this in elf/dl-load.c:

Select All Code:

    case PT_TLS:
#ifdef USE_TLS
    ...
#endif
 
      /* Uh-oh, the binary expects TLS support but we cannot
         provide it.  */
      errval = 0;
      errstring = N_("cannot handle TLS data");
      goto call_lose;
      break;

It looked like libstdc++.so.6, which was required by our binary, had a PT_TLS section in its header. TLS is thread local storage. So, now I needed to find out why the TLS block couldn’t be created.

First, I opened up libc-2.3.4.so in a diassembler. The function name in question was _dl_map_object_from_fd. I then had to find where the PT_TLS case was handled (constant value is 7). From the source code, a _dl_next_tls_modid() is called shortly after. I found that call and traced back the jumps using IDA’s cross-reference feature. I found this:

.text:4AA03A8C                 cmp     eax, 7
.text:4AA03A8F                 nop
.text:4AA03A90                 jnz     short loc_4AA03A50
.text:4AA03A92                 mov     eax, [esi+14h]
.text:4AA03A95                 test    eax, eax
.text:4AA03A97                 jz      short loc_4AA03A50

Bingo! Clearly, USE_TLS is defined, otherwise it wouldn’t bother with a jump. So the problem is definitely in the logic somewhere, rather than a lack of TLS support. It was time for some debugging with gdb:

(gdb) set disas intel
(gdb) display/i $pc
(gdb) br _dl_map_object_from_fd
Breakpoint 1 at 0x4aa03866

I got lucky with the matching address and put a direct breakpoint:

(gdb) del 1
(gdb) br *0x4AA03A8C
Breakpoint 2 at 0x4aa03a8c

I stepped through the assembly, following along in my source code. I narrowed the failing condition down to this code:

         /* If GL(dl_tls_dtv_slotinfo_list) == NULL, then rtld.c did
         not set up TLS data structures, so don't use them now.  */
          || __builtin_expect (GL(dl_tls_dtv_slotinfo_list) != NULL, 1))
        {

I opened up rtld.c and searched for dl_tls_dtv_slotinfo_list — and the answer was immediately apparent:

  /* We do not initialize any of the TLS functionality unless any of the
     initial modules uses TLS.  This makes dynamic loading of modules with
     TLS impossible, but to support it requires either eagerly doing setup
     now or lazily doing it later.  Doing it now makes us incompatible with
     an old kernel that can't perform TLS_INIT_TP, even if no TLS is ever
     used.  Trying to do it lazily is too hairy to try when there could be
     multiple threads (from a non-TLS-using libpthread).  */
  if (!TLS_INIT_TP_EXPENSIVE || GL(dl_tls_max_dtv_idx) > 0)

And there it was. That version of glibc refused to late-load dynamic libraries that had TLS requirements. When I checked the working server, it had a much later libc (2.5 from Centos 5, versus 2.3.4 from Centos 4.4).

I am hardly worthy of nit-picking the likes of glibc maintainers, but I find it lame that the error message was completely undocumented, as was the (lack of) functionality therein. While researching this, I also looked through the glibc CVS – the bug was first fixed here. The big comment explaining the bug remains, even though it appears that as of this revision, it is no longer applicable. Whether that’s true or not, I don’t know. The actual revision comments are effectively useless for determining what the changes mean. I may never really know.

How did I end up solving this? Rather than do an entire system upgrade, I removed our libstdc++ dependency. It just so happened it was there by accident. Oops. Note that earlier versions of libstdc++ had no PT_TLS references — which is why this is a subtle ABI issue.

In the end, the moral of the story is: binary compatibility on Linux is a nightmare. It’s no fault of the just the kernel, or GNU — it’s the fault of everyone picking and enforcing their own standards.

As a final tirade, Glibc needs to get dlerror() message documentation. “Use the source, Luke,” is not an acceptable API reference.

Linux and C++ ABI, Part 1

One particularly annoying gem about developing on Linux is the general disregard for cross-distribution binary compatibility. There are two big reasons for this, one philosophical and the other is semi-technical:

Philosophical: There is a tendency to believe that almost everything on Linux is (or should be) open source; instead of vendors distributing binaries, they should distribute source code for users to compile.
Technical: Linux Distribution vendors can theoretically call or place libraries whatever or wherever they want, or compile them with any options they feel necessary, or leave options out which they feel are unnecessary.

Projects which distribute cross-distro binaries (whether open source or not) have to fight an uphill battle on both of these counts. But it only gets worse when dealing with the GNU Standard C++ Library. The huge problem with libstdc++ is that the compiler offers no easy way to either exclude it or to link it statically.

Excluding libstdc++ is possible if you invoke the compiler with gcc instead of g++ — but you need to overload new, new[], delete, and delete[] (simple malloc() wrappers will do). If you use try/catch/throw or dynamic_cast, forget it. You have to link to libstdc++ no matter what.

Static linking is out of the question for DSOs. You can run into all sorts of problems when loading other dynamic libraries or another libstdc++ in the same process. Not only that, but static linking can be nearly impossible.

It gets worse when you realize that the ABI version to libstdc++ tends to get bumped quite often. AMX Mod X experienced compatibility bumps from GCC versions 2.95, 3.0-3.3, and 3.4. As each distribution decides to package different libstdc++ versions/builds, and some distributions are too old to even support the compiler we choose to use, it becomes increasingly difficult to distribute Linux binaries. Asking average users to compile from source is out of the question — many falter on even a graphical installer.

The continual problems with Linux distributions and libstdc++ are well-known and documented by now. This was just your introduction. Next week I will share an extremely frustrating and subtle ABI problem I ran into with libstdc++ and glibc.

Mystery Bail Theater

Sometimes worth reading

Category Archives: Articles

Design Decision I Most Regret

Pointer Invalidation, or When Caching is Dangerous

The Hell that MOVAPS Hath Wrought

Writing Good Platform Code

It Works by Accident

Linux and C++ ABI, Part 2

Linux and C++ ABI, Part 1