Monthly Archives: September 2009

Six Years Later, Steam is Still Annoying

It’s been a year or so since I fired up Steam to actually try and play a game. Given that I help maintain multiple mods for Valve games, that might seem surprising.

I love the concept of Steam, I’ve had it installed since the early betas, and I hated the days of scouring for files like “10161100.exe” to update Half-Life 1. But the actual software is so mind-bogglingly annoying. I cringe when I hear Steam fanboys tout it as the shining jewel in content distribution.

To demonstrate, I recently got a desktop and decided to try playing a game I paid for on Steam. I downloaded Steam, installed it, and instantly got bombarded with stuff like this:

Steam Notifications

These three things would slide down, then three more would pop up. Rinse and repeat every few seconds. Really annoying. I have Steam Friends completely pref’d off on my laptop, but I guess Steam doesn’t sync account settings across computers. I’d rather not disable Steam Friends completely, but of the three options to disable notifications, none seem related to the ones up above. So I pref’d Friends off. The little messages kept coming though, until the whole list of notifications was exhausted.

Later I came back and tried to install a game. It said “Your Steam ticket has expired” and asked for my password. I typed it in, and nothing happened. The dialog disappeared but the game wasn’t downloading. I double-clicked the game to try again. It asked for my password again, but no-go.

I tried this a few more times, then restarted Steam. When it started back up, all of the icons were missing:

Steam - No Icons

Okay, that’s weird. I double-clicked the game again, and I still got the expired ticket dialog box. I typed in my password again, but this time selected “Remember my password”. The game didn’t start installing, but the icons appeared.

I tried installing again, and now I got a new dialog box: “The Steam servers are currently too busy to handle your request.” Huh? The next try got me back to the password entry dialog box because my “Steam ticket” had expired again.

I searched Google and found a Steam forum thread describing my problem. Another thread linked from comment #11 said to try deleting “ClientRegistry.blob”, and if that doesn’t work, reinstall Steam.

So I exited Steam, deleted “C:\Program Files (x86)\Steam\ClientRegistry.blob”, and restarted. When I tried installing the game, I actually got a progress bar. By the time it had finished downloading I’d moved on to other things, but at least next time I’m in the mood to play a game on Steam, I know to delete random internal files.

This product… needs polishing.

64-bit TraceMonkey!

As of an hour ago, Mozilla’s JavaScript engine now has 64-bit JIT support turned on by default in the TraceMonkey tree only. We currently don’t ship a 64-bit browser, but we will eventually, and this is an important step in making sure we’re staying optimized on all major platforms.

If you are bleeding-edge and build your own 64-bit Firefox, you can now easily try out the JIT and report any problems.

A nice discovery was that the 64-bit engine as a whole outperforms 32-bit builds. SunSpider ran about 20% faster, comparing both 32-bit and 64-bit builds with and without the JIT.

For more information see this newsgroup post on
mozilla.dev.tech.js-engine

Trace Compilation and Recursion, Part 1

TraceMonkey does really well at lots of benchmarks. Specifically, it does well at benchmarks that it can trace compile. If for some reason a loop contains a code path that cannot be traced, that path will never run inside the JIT – instead it stays in the interpreter, which is much slower.

There isn’t much left that we can’t trace, but when that happens, it lights up benchmarks in bad ways. One of the areas we don’t trace at all is recursion.

Recursion isn’t too common on the web but it’s still a big hole in our functionality, so I worked on it quite a bit this summer. Unfortunately it’s non-trivial, and still hasn’t landed, but I’d like to talk about what I do have working.

Quick Introduction

TraceMonkey uses trace trees. If some repetitive code is run, like a loop, a trace recorder kicks in. This records the behavior of the code for one iteration. If any behavior could diverge at runtime – such as an if branch – it becomes a “guard”, and we only record the observed path.

We assemble these observations into statically-typed, native CPU code. If a guard is triggered while running this new, faster code, it means the behavior has changed. We start recording a new path, and attach it to the point of divergence. Thus the code is literally a tree of traces.

Recursion is a Loop, Sort Of

Recursion is a loop, so we’d like to compile as such, using the behavior we observe at runtime. Let’s look at a simple recursive function and identify where it loops.

Select All Code:

function factorial(n) {
  if (n == 0)
    return 1;
  return n * factorial(n - 1);
}

What happens when you call factorial(10)? It starts off in the interpreter here:

Select All Code:

  if (n == 0)

n isn’t 0, so the indented path gets skipped.

Select All Code:

  return n * factorial(n - 1);

First n - 1 evaluates to 9. Then factorial() is called again, and we’re back at:

Select All Code:

  if (n == 0)

Bam. We’ve hit the same point, and thus identified a repeating pattern. To make sure it’s hot, the tracer recorder waits for one more iteration. Then it kicks in and starts recording the behavior it sees. Is (n == 0)? No, n is 8, so the tracer records:

Select All Code:

1: guard n != 0

Next, the trace compiler sees the n - 1 and function call and records those.

Select All Code:

2: subtract(n, 0x1)
3: add stack space (frame) for factorial call
4: n = result of instruction #2

Next the trace recorder sees if n == 0, which is where it started recording. It adds a “jump to the beginning” instruction and compiles the trace to native code. Visually it looks something like this:

But wait! That’s only “half” of the function. Specifically, it’s the down-recursive half, which dives deeper and deeper. What happens when we run this? The code will build a bunch of stack frames that look like this:

frame #	variables
9	`n = 0`
8	`n = 1`
7	`n = 2`
6	`n = 3`
5	`n = 4`
4	`n = 5`
3	`n = 6`
2	`n = 7`
1	`n = 8`

When n == 0, the guard fails. This is really nasty. It turns out the JIT and interpreter have completely different ways of representing stack frames, and when transferring control the frames must be converted from one to the other. Locally, this process is really expensive. As you can imagine it is disastrous for deeply recursive functions. The only way to fully mitigate this is to make sure we spend more time in the JIT. Ideally, we never want to reconstruct any frames.

Upwards Recursion

But back to what happens. n == 0, the guard has triggered, and we’re back in the interpreter. The recorder only wants to observe hot code, so it doesn’t record the path from the guard yet. Now we see this get executed:

Select All Code:

  if (n == 0)
    return 1;

Remember, we’re at frame #9 from the diagram above. The “return” instruction pops us to frame #8, and now we’re at:

Select All Code:

  return n * (result of factorial(n - 1));

This returns 1 * 1 to the calling function, and now we’re at:

Select All Code:

  return n * (result of factorial(n - 1));

We’re back at the same place! That’s a loop. This time 2 * 1 is multiplied to return 2. The trace recorder kicks in and starts recording from frame #6, but hits a snag: it sees a return, but hasn’t recorded a matching function call for this trace. This is where things get weird. How can it observe a return if it doesn’t know anything about frame #5?

It can’t. That’s sad, but we really want to solve this. If the JIT can unwind its own recursive frames, we won’t hit that expensive synthesizing path. The trick is to infer some uniquely identifying information about frame 5: its variable’s types and program counter. We guard on that identity. If that guard fails at runtime, we know that it is illegal to keep popping frames.

The resulting trace looks like this:

Linking the Two Halves

Hopefully, the down-recursive side will be run again. When the n == 0 guard hits, it will be “hot”, and the recorder starts up. This time it notices the “return”, looks at the above frame, and recognizes that recursion is unwinding. It links the two traces together like so:

Beautiful! Except for…

Next Time

There are still problems with this approach. The major one is that by the time the up-recursive frame is built, all of the frames exist on the interpreter – not the JIT. When control transfers from the interpreter to the JIT, only one frame is converted because the process is very expensive.

That means the first up-recursive guard will hit, since there are no more JIT frames. We’ll do the expensive reconstruction process on the same frame, transfer it back to the interpreter, then re-enter the interpreter. This will happen for each frame until the recursion is unwound, all to execute a few multiplies!

That’s REALLY bad. To make things worse, this whole process will happen at least twice. The first time is to unwind the initial down recursion. The next time is because the n == 0 guard fails, and all the frames are popped off the JIT and transferred to the interpreter again. Everything only runs smoothly on the third invocation of the recursion function.

Next time I’ll discuss approaches to solving that situation.

Mystery Bail Theater

Sometimes worth reading

Monthly Archives: September 2009

Six Years Later, Steam is Still Annoying

64-bit TraceMonkey!

Trace Compilation and Recursion, Part 1