This article describes how we used JBoss Cache to create a clustered load server solution for our Java based load tool.

You want me to load test what?

At my previous place of work and currently at Cubeia, I am working with high capacity multiplayer systems. Our current game server, Firebase, is targeted to handle tens of thousands of concurrent players. One problem one quickly realizes when working with high capacity systems is that you need to verify the capacity by load testing it continuously.
For the tests, we have developed simulated players (referred to as bots) which act as closely as real players as possible. Depending on the game, a single load server can run between 4 000 and 15 000 concurrent bots. It is then obvious that if we want to simulate for instance 100 000 bots, we will need to employ a bit more than one load server.
In our initial installation we already had a Java-based load generator/simulation (i.e. bots) in place, but we needed to configure and start each server process manually. We provided the number of bots and any behavioral parameters to the load test through configuration scripts.
Not only was it a tedious task to configure all servers, but we must also coordinate the load tasks so that all simulated players login with unique credentials and id, otherwise we are not simulating different players. We had this solved by providing an id offset to the configuration, which of course increased the complexity and possibility for human errors.
Seeing the slow turnaround for a single load test run and experiencing the labor intense configuring of all load servers, we started to investigate possible clustering/farming solutions. Additionally, it would be nice to provide a single, simple graphics interface for starting load from all servers in one click, this would enable non-technical personnel to start and stop the load tests.

Measure twice, cut once

The hard part of choosing a solution is not from the lack of options, but rather from the abundance of them. We can consider a server topology of 3 load servers like the one shown below:

We wanted an operator to be able to connect to any of the load servers above and send a request for starting more bots. The load of the request should optimally be spread evenly over the servers in the cluster so that no server is more loaded then another. The load servers (Server 1-3 above) will all target the request towards the same application server cluster that is being tested.

The request should be broken down into smaller request, hereby referred to as tasks, which should be distributed over the servers in the cluster, hereby referred to as members.
We considered some different ways of distributing the tasks over the nodes; HTTP requests, remote calls (RMI, Thrift etc), write everything to a database or using a distributed cache.
The database path was discarded fairly quickly, it never really seemed to fit in and we never had the need to store data after the test was completed anyway. Using simple HTTP requests seemed like the easiest initial choice, but as with an RPC solution this would need a preconfigured system with all load servers IP addresses since we would not have any means of discovery.
In the end we settled for a distributed cache solution, our cache of choice was JBoss Cache . JBoss Cache would give us:

  • Dynamic group membership (no need to preconfigure members)
  • Listener interface for member left and joined
  • Listener interfaces for changes to the data structure
  • Replicated in-memory state over all servers
  • No need for a broker (which would have been the case with a database, Terracotta or JMS for instance)

These features would allow us to build a highly dynamic and failure-tolerant system, which was exactly what we wanted!

JBoss Cache overview

JBoss Cache is a replicated and transactional cache. It is built on top of JGroups and has extensive replication configuration options. The data in JBoss Cache is stored in a tree structure. Each branching point is referred to as a node. Each node can contain data, in a key-value form, and children nodes. For most practical cases you can consider each node to hold a standard Map of objects.

Below is an example of a basic tree structure of a JBoss Cache:
Group discovery is handled through multicast and new members will be added dynamically. All you need to do for new servers to find the group is to make sure you have the same multicast address configured and they will find each other. Hereis a link to a short tutorial on JBoss Cache that will explain the workings of JBoss Cache.

For the load server we were not interested in the transactional features and we decided to use the default replication mechanism, which is asynchronous replication.

Propagating requests and data

JBoss Cache will handle group membership which means that every member will have a list of all other members and will be notified when the group changes. When a user requests a load scenario to be started we should divide and spread the request over all members in the cluster. The flow of events stemming from a request was defined as:

  1. A request is received on any server.
  2. The request is split up into tasks; the number of tasks will depend on the number of members in the group.
  3. Each task will be written to the JBoss Cache in a member specific area.
  4. Each member will react on the task written to the member’s own area and start a load scenario as specified by the task.
  5. Members will continuously write status updates to the member’s own area in the cache.

Here is an example of how the above would work in practice:

  1. User requests 3000 bots to be started to Server1.
  2. Server1 has a total of 3 members in the cluster (Server1, Server2 and Server3). Each task will then consist of 1000 bots.
  3. The three tasks are written to the cache. This could be solved by writing the following for example:
    1. cache.put(“/server1/request1/”, “count”, 1000)
    2. cache.put(“/server2/request1/”, “count”, 1000)
    3. cache.put(“/server3/request1/”, “count”, 1000)
  4. All members cache listeners are now notified with nodeModified() 3 times, one for each write to the cache. Each member inspects the data path (e.g. “/server1/request1/”) and checks if the write is within its domain. For server2 this would include any path starting with “/server2/…”.
  5. As the task is started, the member writes status of the load request to its area. For server3 this could for example be:
    1. cache.put(“/server3/task1/”, “status”, “starting”);
    2. cache.put(“/server3/task1/”, “loggedInCount”, “500″);
  6. etc.
When the user request a status report, then the server that receives the request simply scans the data contained in all members status areas and aggregates the data. This is trivial to do since all data written to the cache will be replicated to all members. From the example above, the aggregated data could be presented like this:
REQUEST 1
bots: 3000
Logged In: 2550
Server1
bots: 1000
Status: running
Logged In: 1000
Server2
bots: 1000
Status: running
Logged In: 1000
Server3
bots: 1000
Status: starting
Logged In: 550
From the aggregated view we can see that out of our 3000 requested bots, only 2550 have logged in and the reason is that server3 is still trailing with logging in its bots.

Cache design

When designing the cache layout we have a few requirements:

  • A member should have its own defined data area in the cache
  • Requests will be pushed to the member-specific area by any server.
  • Current states will be written to the member specific area by the local member only. Any server may read this area.

As server id I opted to use the member id used in JBoss Cache, this will basically be you local IP. I also wanted to keep all requests separated by id, hence the request1/task1 pattern above. Task id makes it possible to match requests with statuses and enables tasks to be shut down by the user when the test ends. Since we are allowing for request to be made to any of the servers, and potentially also concurrently, I recommend using UUID’s as request/task id.

Below is the cache design shown graphically. Of course, in the real application the request and status will have many more parameters but I have left them out here for simplicity.
The example above contains one request that has been divided into tasks with a UUID starting with ’6fa459ea’. The other member area (192.168.0.2) should also contain a tree structure with the same request and status layout. Any further added requests would be added under the ‘requests’ node with new unique ids.

Assembly

So, we have decided on using a distributed cache (JBoss Cache) for clustering and dispatching of requests. We have come up with a cache design that will suit our needs and an implementation of dispatching the requests to the members in the cluster. But how does this fit in with the big picture, where we also want to have a graphical user interface and be able to start load scenarios?

The load server application was designed in a layered fashion. For the user interface we used an embedded Jetty (http://www.mortbay.org/jetty/) server that serves static HTML pages with forms for different load scenarios. The Jetty also includes a servlet that handles requests. Below is an overview of the different layers in the application.
The receiving servlet sends the request down to the next layer (Cache) that will split the request into tasks and write them to the cache as described in the previous chapters. The cache layer also listens to changes to the cache so tasks that are written on a different server will be picked up and started by the designated server.

The load generator layer is the actual load simulation execution and the status layer is the periodical writing of request statuses to the cache. In our implementation we abstracted all the calls directly to the cache so that no other layer than the Cache layer has any dependency to JBoss Cache.
When we had the complete stack of layers implemented we had a working load server farm with some nice features to boot:

  • Virtually zero configuration for new load servers. There is no broker or database that has local connectivity that needs to be configured, just start a new instance and it will join any running farm if present.
  • Dynamic cluster. New servers can be added or removed runtime.
  • We added a status flag that could have three values: RUNNING, FAILED, RESUMED. If a server for some reason died or crashed, we would get a listener notification and flag the status as FAILED. An operator can then restart the server which would then scan the cache for any failed tasks under its area, which would then be restarted and the status set to RESUMED. This allowed for some rudimentary fail over handling.
  • Very easy to use. The interfaces are simple HTML forms instead of ssh:ing around to 10+ Linux servers and reconfiguring each request.
The implementation was fairly quick and painless; we already had the load generator in place so we only had to implement the HTTP and Cache layer. All in all this took about 5-10 days from initial design thoughts to a working version. The time invested was very quickly returned in terms of faster turnaround and fewer mistakes when setting up massive load tests. It also made the load tests a lot more fun since it now involved far less configuration!
A side benefit was that with a proper shutdown of each request the farm can now serve as a pool for load testing. Any developer can use the load server farm to run load test on any environment reachable by the load servers.
I am sure there are other just as good solutions for achieving the same functionality, but for us using JBoss Cache to create a clustered server farm was a very good fit and proved highly valuable considering the low amount of invested time that was needed. I for one am sure there are other server applications out there that could benefit from such a solution as well.
Fredrik Johansson is a founder and CEO of Cubeia Ltd
You can contact him at: fredrik.johansson(at)cubeia.com