Posts Tagged ‘concurrency’

Asynchronous Games in Firebase; pt I

Thursday, March 11th, 2010 by larsan

Since the social web started to expand, asynchronous multiplayer games seems to really have taken off. So my immediate question is obviously, can I do that on Firebase? And the short answer is: of course. For the long answer, please read on.

Let’s recap first. In a traditional synchronous game you play against other people in real-time, everyone you play against will have to be online at the same time. But given the nature of our everyday Internet use, a different pattern has emerged: in asynchronous games not all players have to be online at the same time. For example, turn-based games can easily be made asynchronous, you play your turn and then sometime in the future you opponents will have been online and performed their moves. As such, a classic game such as chess in play-be-email mode is a prime example of an asynchronous game. But you’re not limited to turn-based games, you can define the order to act on time or any metric you’d like, the only mandatory feature is that all players do not have to be online at the same time.

Firebase is fundamentally a synchronous multiplayer server. The technical difficulties Firebase set out to solve are all issues arising from the kind of distributed computing you get when a couple of thousand players have to interact in real-time. So normally you’d build an asynchronous game on a traditional web platform, as the demand on timely updates just got a lot less urgent standard web techniques can be easily applied. But having said that, is there anything you’d gain by using Firebase?

I believe there is: A synchronous option. Easy to overlook, this is actually a big one. If you build your asynchronous game on Firebase from the start, you have an immediate option of including synchronous play. Or better yet, the difference between synchronous and asynchronous could be a simple configuration issue. And by the way, who says it’s either or? Why not both?

Let me explain the above in an example: At the moment I’m hacking on a Kalaha game for Facebook on my past-time. And obviously it’ll be asynchronous, players will take their turn while logged in, and the game mechanics will inform you of your move via the Facebook streams. But if two players are online at the same time and want to play against each other, why shouldn’t they? And would quick matches be fun, say max 5 seconds per move? And how about tournaments?

Here’s some other, perhaps minor, points to consider:

  • Duplex communication: implicit in the entire discussion is that Firebase can push events both ways from and to clients and server. In HTTP you’re forced to use Comet patterns, whereas Firebase uses a persistent connection and any actions are delivered to the clients immediately.
  • Latency: Just because you’re building an asynchronous game doesn’t automatically mean latency is not important. If your game is based on actions against a common notion of time, you might be interested in latency anyway. Firebase has an very low latency compared to most application/web servers.
  • Bandwidth: If it turns out you have a hugely successful game, and a large portion of the traffic is made up of game data (as opposed to actual web pages or other contextual data) you’ll be happy just to avoid the HTTP headers and the overhead they automatically bring to the table. Firebase uses TCP only at the moment, thus dramatically reducing the bandwidth requirements.

There are other less apparent options as well, but lets be honest, anything but a huge advantage would probably be negated by the additional platform with the integration and administration it’d mean. That is to say, if you can built it using standard web tools only, why add another platform with which to communicate, integrate and administrate?

So that’s my declaration: Using Firebase gives you the entire range from highly synchronous to entirely asynchronous on one platform! Next post I’ll sketch a proposal on how to actually do it as well. Stay tuned!

A Java Concurrency Bug, and How We Solved It

Tuesday, April 14th, 2009 by larsan

Everyone agrees that good debugging is critical. But it is not entirely trivial when it comes to multi-threaded applications. Here’s the story how I tracked down a nasty bug in the Java5 ReentrantReadWriteLock.

Update 15/4: To be a bit clearer, the Java bug in question is on 1) Java 5; and 2) only when using fair locks, non fairness seems to work, however, that was never an option for us, if nothing else due to this other Java bug… Thanks to Artur Biesiadowski for the heads up.

Our product Firebase is designed to offload threading issues and scalability from game developers to a generic platform. However, one of our customers experienced mysterious “freezes” when running a fairly small system load in their staging environment.

First step on the way: getting stack dumps when the system is frozen. In other words, request that the client do a “kill -3” on the server process when it’s hanged, as this dumps all threads and their current stack traces to the standard output. This we did, but only got confused by it, all threads seemed dormant and there was no visible dead-lock in the stack traces.

However, several threads were all mysteriously waiting on a read lock deep down in the server, and seemingly not getting any further, despite that fact that no-one was holding the write lock. This wasn’t conclusive though as there was one thread waiting to take the write lock and this would block the readers. But given that the server was actually frozen it looked suspicious. So my first analysis concluded:

As far as I can see, the only abnormal thing that could have caused this stack is if a thread have taken the lock (read or write) and then crashed without releasing it, however there doesn’t seem to be any place in the code not safeguarded by try/finally (where the lock is released in the finally clause).

Implied in that conclusion is of course that this might either be a normal behavior and we’re looking in the wrong direction, or that we have a more serious Java error on our hands.

There’s a lot of information to be had from a ReentrantReadWriteLock, including the threads waiting for either read or write privileges, and the thread holding a write lock (if any), but not (strangely enough) the threads actually holding a read lock. And as a reader thread can effectively block the entire lock by not unlocking while a writer is waiting, this is information you really need to know.

So the next step was to get hold of the reader threads. I did this by sub-classing the ReentrantReadWriteLock to return my own version of the read lock, which, simplified, did this:

Set readers = Collections.synchronizedSet(new HashSet());  

public Set getReaders() {
    return readers;
}  

public void lock() {
    super.lock();
    readers.add(Thread.currentThread());
}  

public void unlock() {
    super.unlock();
    readers.remove(Thread.currentThread());
}

Given this code, we now have a set containing a snapshot of the threads holding the read lock. I then added a JMX command for printing the following information to standard out for the given read/write lock:

  1. The threads waiting for a read lock, including their stack traces.

  2. The threads waiting for the write lock, including their stack traces.

  3. Any threads holding a read lock, including their stack traces.

  4. The thread holding the write lock, including its stack trace, if any.

I shipped this patched code to the client and asked them to freeze the server with the patch, print the new debug information, and then send me the output. Which they did, and the relevant output looked like this (very much shortened):

Holding reader: Thread[DQueue Handoff Executor Pool Thread { GAME }-1,5,main]
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
    […]  

Waiting writer: Thread[Incoming,dqueue,127.0.0.1:7801,5,Thread Pools]
    sun.misc.Unsafe.park(Native Method)
    […]  

Waiting reader: Thread[DQueue Handoff Executor Pool Thread { GAME }-1,5,main]
    sun.misc.Unsafe.park(Native Method)
    [...]

See anything strange here? It appears that the same thread is both holding and waiting for a read lock at the same time. But this is supposed to be a reentrant lock. In which case…

So, the freeze was caused by this: There’s a bug in Java5 (but not in Java6) where a fair ReentrantReadWriteLock stops a reader from re-entering the lock if there’s a writer waiting. It is of course easy to write a test case for, which you can find here.

This is now submitted as a bug to Sun, but I have yet to get a confirmation and bug number.

As for Firebase, it is now patched and released to use manually tracked re-entrance for read/write locks through the entire server when running under Java5, looking, again very much simplified, like this:

private ThreadLocal count = new ThreadLocal();  

public void lock() {
    if (get() == 0) {
        // don't lock if we alread have a count, in other words, only
        // really lock the first time we enter the lock
        super.lock();
    }
    // increment counter
    increment();
}  

public void unlock() {
    if (get() == 0) {
        // we don't have the lock, this is an illegal state
        throw new IllegalMonitorStateException();
    } else if (get() == 1) {
        // we have the lock, and this is the “first” one, so unlock
        super.unlock();
        remove();
    } else {
        // this is not the “first” lock, so just count down
        decrement();
    }
}	

// --- HELPER METHODS --- //  

private void remove() {
    count.remove();
} 	 

private int get() {
    AtomicInteger i = count.get();
    if (i == null) return 0;
    else {
        return i.intValue();
    }
} 	 

private void increment() {
    AtomicInteger i = count.get();
    if (i == null) {
        i = new AtomicInteger(0);
        count.set(i);
    }
    i.incrementAndGet();
} 	 

private void decrement() {
    AtomicInteger i = count.get();
    if (i == null) {
        // we should never get here...
        throw new IllegalStateException();
    }
    i.decrementAndGet();
}

The above code isn’t trivial, but shouldn’t be too hard to decode: We’re simple managing a “count” each time a thread takes the read lock, but we’re only actually locking the first time, the other times we simply increment the counter. On unlock, we decrement the counter, and if it is the “last” lock, if the counter equals 1, we do the real unlock and remove the counter.

There are two things to learn, 1) in any sufficiently complex server, the above debug information is nice to have from the start on any important lock; and 2) it really is time to migrate to Java6.

Lars J. Nilsson is a founder and Executive Vice President of Cubeia Ltd. He’s an autodidact programmer, former opera singer, and lunch time poet. You can contact him at lars.j.nilsson in Cubeia’s domain.