This article describes how we used JBoss Cache to create a clustered load server solution for our Java based load tool.
You want me to load test what?
Measure twice, cut once
The hard part of choosing a solution is not from the lack of options, but rather from the abundance of them. We can consider a server topology of 3 load servers like the one shown below:
- Dynamic group membership (no need to preconfigure members)
- Listener interface for member left and joined
- Listener interfaces for changes to the data structure
- Replicated in-memory state over all servers
- No need for a broker (which would have been the case with a database, Terracotta or JMS for instance)
These features would allow us to build a highly dynamic and failure-tolerant system, which was exactly what we wanted!
JBoss Cache overview
JBoss Cache is a replicated and transactional cache. It is built on top of JGroups and has extensive replication configuration options. The data in JBoss Cache is stored in a tree structure. Each branching point is referred to as a node. Each node can contain data, in a key-value form, and children nodes. For most practical cases you can consider each node to hold a standard Map of objects.
Propagating requests and data
JBoss Cache will handle group membership which means that every member will have a list of all other members and will be notified when the group changes. When a user requests a load scenario to be started we should divide and spread the request over all members in the cluster. The flow of events stemming from a request was defined as:
- A request is received on any server.
- The request is split up into tasks; the number of tasks will depend on the number of members in the group.
- Each task will be written to the JBoss Cache in a member specific area.
- Each member will react on the task written to the member’s own area and start a load scenario as specified by the task.
- Members will continuously write status updates to the member’s own area in the cache.
Here is an example of how the above would work in practice:
- User requests 3000 bots to be started to Server1.
- Server1 has a total of 3 members in the cluster (Server1, Server2 and Server3). Each task will then consist of 1000 bots.
- The three tasks are written to the cache. This could be solved by writing the following for example:
- cache.put(“/server1/request1/”, “count”, 1000)
- cache.put(“/server2/request1/”, “count”, 1000)
- cache.put(“/server3/request1/”, “count”, 1000)
- All members cache listeners are now notified with nodeModified() 3 times, one for each write to the cache. Each member inspects the data path (e.g. “/server1/request1/”) and checks if the write is within its domain. For server2 this would include any path starting with “/server2/…”.
- As the task is started, the member writes status of the load request to its area. For server3 this could for example be:
- cache.put(“/server3/task1/”, “status”, “starting”);
- cache.put(“/server3/task1/”, “loggedInCount”, “500″);
Logged In: 2550
Logged In: 1000
Logged In: 1000
Logged In: 550
When designing the cache layout we have a few requirements:
- A member should have its own defined data area in the cache
- Requests will be pushed to the member-specific area by any server.
- Current states will be written to the member specific area by the local member only. Any server may read this area.
As server id I opted to use the member id used in JBoss Cache, this will basically be you local IP. I also wanted to keep all requests separated by id, hence the request1/task1 pattern above. Task id makes it possible to match requests with statuses and enables tasks to be shut down by the user when the test ends. Since we are allowing for request to be made to any of the servers, and potentially also concurrently, I recommend using UUID’s as request/task id.
So, we have decided on using a distributed cache (JBoss Cache) for clustering and dispatching of requests. We have come up with a cache design that will suit our needs and an implementation of dispatching the requests to the members in the cluster. But how does this fit in with the big picture, where we also want to have a graphical user interface and be able to start load scenarios?
- Virtually zero configuration for new load servers. There is no broker or database that has local connectivity that needs to be configured, just start a new instance and it will join any running farm if present.
- Dynamic cluster. New servers can be added or removed runtime.
- We added a status flag that could have three values: RUNNING, FAILED, RESUMED. If a server for some reason died or crashed, we would get a listener notification and flag the status as FAILED. An operator can then restart the server which would then scan the cache for any failed tasks under its area, which would then be restarted and the status set to RESUMED. This allowed for some rudimentary fail over handling.
- Very easy to use. The interfaces are simple HTML forms instead of ssh:ing around to 10+ Linux servers and reconfiguring each request.