Monthly Archives: October 2010

The Duct Tape Architect

The Duct-Tape Architect
I have being doing software architectural work for a long time now, and as it turns out, the ‘right way’ of solving things may not always be the best way. Below are two real annecdotes from life in the trenches.
== The Case of the Database Bottleneck
I was visiting a company where we discussed their currenty system design and what problems they were experiencing. Their system had a respectful peak of 7 000 concurrent users and it turns out that at peak times they start to hit the limit of their database which is running as a single entity.
We discussed the regular slew of database scaling solutions such as sharding, dedicated reader nodes etc. and some pros and cons with each solution.
As it turned out however, they solved it in their own way. “Yeah, we solved it. We bought an SSD drive which replaced the old hard-drive, ” they said matter-of-factly.
Naturally I scoffed a bit at this and thought for myself that they have only bought themselves a little bit of time and in best case they could maximum grown by 50-100% but then it would be the same issues all over! Rookies! Surely they did not understand the beauty of unlimited, linear scaling with sharding?
Within one year, they had grown about 20% and was then bought by a much bigger player in the industry and their system was erased from the face of the earth in favor of the larger one. They never hit the limit of the SSD.
Later also did some calculations, if they had grown by 100%, they would have become one of the top5 actors in the market and their profits would have been through the roof.
So when looking back, in this case, it seems like the SSD solution was actually the right thing to do. They only needed to buy some more time for the deal to come through.
== The Case of the Missing Scheduler
Another case occurred when I was reviewing a large gaming network that was running cash games as well as tournaments. There where many tournaments running on a daily basis and most of them are re-occurring events, such as The Daily Lunch Tournament etc. Almost every such network I have known have had a scheduling options for tournaments. You would enter a tournament template and then say something like ‘run every day at 12 AM’ for instance. You would also be able to create a tournament and say ‘start this tournament on October 10 at 18 AM’. Then the system would create and start the tournament as specified.
This network did not have that.
Instead, they had about 10 employees in Indonesia who would work in shift and manually create each tournament and then manually click ‘start’ to start them. Nuts! This must surely be fixed!
So we started a discussion and I don’t remember my exact word, but they were something like: “This is insane! Surely we should be able to implement a simple scheduler in the system?”.
To which they replied something like: “Sure. But we have made an estimate on the time it would take us, and to cost of the developers that has US salary to implement this equals about 7 years of the Indonesian guys doing this manually.”
“Besides, do you want to be the guy who calls them up and tell them and their families that they are losing their jobs? And for what? Saving a buck after 7 years? We have a choke-full backlog to work on anyway.”
Hmmm. Maybe it would not be worth cutting other features out in order to prioritize a feature that would cause 10 people their jobs and not save any money for a long time. Could this be?
== Summary
Am I advocating that you shouldn’t care about scalability (just buy SSD’s!) or never automate tasks because there are cheap labour to be found? Am I advocating quick hacks and avoiding solid engineering principles? Of course not.
But sometimes it is good to try and raise the view a bit and try to see what *actually* needs to be solved. As engineers we do sometimes get stuck on implementing the ‘right thing’ and lose sight of reality as it comes.

I have been doing software architectural work for a long time now, and as it turns out, the ‘right way’ of solving things may not always be the best way. Below are two anecdotes from life in the trenches.

By |Friday, October 8, 2010|cubeia|24 Comments

The 30 Second Rule

Recently I ranted about non-productive tasks that can divert focus away from the developer and seriously hamper the effectiveness in your project. As an extension of that discussion I would like to propose a new rule – the 30 second rule:

Any process that takes longer than 30 seconds to run, must be fixed.

By |Friday, October 1, 2010|cubeia|4 Comments