I have been doing software architectural work for a long time now, and as it turns out, the ‘right way’ of solving things may not always be the best way. Below are two anecdotes from life in the trenches.
The Case of the Database Bottleneck
I was visiting a company where we discussed their current system design and what problems they were experiencing. Their system had a respectful peak of 7 000 concurrent users and it turns out that at those peak times they started to hit the limit of their database which was running as a single entity.
We discussed the regular slew of database scaling solutions such as sharding, dedicated reader nodes etc. and some pros and cons with each solution.
As it turned out however, they solved it in their own way. “Yeah, we solved it, ” they came back to me when I asked them about it. “We bought an SSD drive which replaced the old hard-drive. It is much faster now.” they said matter-of-factly.
Naturally I scoffed at this and thought for myself that they have only bought themselves a little bit of time and, in at best, they could grow by 50-100% but then it would be the same issues all over! Rookies! Surely they did not understand the beauty of unlimited, linear scaling with sharding?
Within less than a year, they had grown about 20% and was then bought up by a bigger player in the industry. As is custom, their system was erased from the face of the earth in favor of the larger one. They never hit the limit of the SSD.
Later on I also did some calculations, if they had grown by 100%, they would have become one of the top5 actors in the market and their profits would have been through the roof. Their development budget would have been completely different by then.
So when looking back, in this case, it actually seems like the SSD solution was the right thing to do. They only needed to buy some more time for the deal to come through.
The Case of the Missing Scheduler
Another case occurred when I was reviewing a large gaming network that was running cash games as well as tournaments. There where many tournaments running on a daily basis and most of them were re-occurring events, such as The Daily Lunch Tournament etc. Almost every gaming network I know has a scheduling option for tournaments. An administrator would enter a tournament template and then say something like ‘run every day at 12 AM’ for instance. You would also be able to create a future tournament and say ’start this tournament on October 10 at 18 AM’. Then the system would then create and start the tournament as specified.
This network did not have that.
Instead, they had about 10 employees in Indonesia who would work in shift and manually create each tournament and then manually click ’start’ to start them. Nuts! This must surely be fixed!
So we started a discussion and I don’t remember my exact word, but they were something like: “This is insane! Surely we should be able to implement a simple scheduler in the system?”.
To which they replied something like: “Sure. But we have made an estimate on the time it would take us, and the cost of the developers on US salaries to implement this corresponds to about 7 years of the Indonesian guys doing this manually.”
Yikes.
“Besides, do you want to be the guy who calls them up and tell them and their families that they are losing their jobs? And for what? Saving a buck after 7 years? We have a choke-full backlog to work on anyway.”
Hmmm. Maybe it would not be worth cutting other features out in order to prioritize a feature that would cause 10 people their jobs and not save any money for a long time. Could this be? What kind of socialist development company was this?
As you might have guessed by now, by being able to dedicate their developers to other things rather than make the Indonesians redundant they were able to dish out new feature that actually attracted new players. Which turned out to be very successful for the owners in the end.
Summary
Am I advocating that you shouldn’t care about scalability (just buy SSD’s!) or never automate tasks because there are cheap labour to be found? Am I advocating quick hacks and avoiding solid engineering principles? Of course not.
But sometimes it is good to try and raise the view a bit and try to see what *actually* needs to be solved. As engineers we do sometimes get stuck on implementing the ‘right thing’ and lose sight of reality as it comes. I know I do
You can contact him at: fredrik.johansson(at)cubeia.com

RSS 2.0
The Hacker’s Diet highlights the differences between engineers *fixing* problems and managers *managing* them. I’m an engineer and fixer, but constantly remind myself some problems only need managing.
http://www.fourmilab.ch/hackdiet/www/subsection1_2_1_0_4.html
Excellent link, thank you!
Very interesting read thanks
nice examples. i always want to do things “right” and it ends up taking so much time, but usually never stop to think if its worth it.
Good points.
great write up.
Nice article, and interesting thoughts ! Thanks a lot. Enjoyed reading it.
Steward Brand calls this “satisficing” in his book How Buildings Learn, which I highly recommend as great exploration of this and similar ideas. The book was filmed by the BBC (or was it the other way around?) which is available on online IIRC
Hey, great read dude
The simplest solutions are the best one yet hardest to find.
Kent Beck had this nice story about an online game: Part 1 was available for free, and the plan was that people pay for Part 2. So they wanted to add a link to Part 2 at the end of Part 1 (sth like “click here to buy Part 2″).
Where does the link take the user? To a billing system, and then on to Part 2. That would have required integrating a billing system into the game. However, since they did not know yet if any customer would actually buy the game, they ended up just linking to Part2, so users would get to play the game without paying.
Why? Aren’t you loosing money if you don’t charge your customers? Well, if you don’t have any customers, you don’t really care – you can first sit down and watch whether anyone would actually buy your game.
One of the much more remarkable blogs Ive seen. Thanks so a lot for preserving the internet classy for a alter. Youve acquired style, class, bravado. I suggest it. Make sure you retain it up simply because with out the internet is absolutely lacking in intelligence.
This was a good article. Solutions need to fit the constraints of the domain. In this case, those constraints changed and were enough to allow the company to focus on other more profitable things that could be resolved at a later time.
It is good that architects learn to be good listeners as well as creative problem solvers.
Some times it’s difficult to stop and remember that it can often be more valuable to develop for “what is” and not for “what if.”
Pretty nice post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!
i will mark u 5 star!!! good work!!
Obligatory quote from Donald Knuth (1974):
“There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: *premature optimization is the root of all evil.*
Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.”
In my opinion you are mistaken. I can prove it. Write to me in PM, we will communicate.
It seems to me it is good idea. I agree with you.
Mr. Johanssen,
As a practising technologist and someone who has been working with Online Betting games developing start-ups for some time, I completely agree with you. A handful of techniques that I could devise after a bit of research and even prototyping to automate and speed up things were held back from development and deployment and for good reasons. I have to say I was not happy at the decision at the onset, but looking back, some of those decisions made complete business sense. However, the CEO always backed such experiments and tweaking of the design, and I managed to hold on my job
Warm regards
– Nirmalya
[http://www.linkedin.com/in/nirmalyasengupta]
Brilliant post! I agree on almost of the point presented in the article. Keep it up!
This is a very2 ggood post?I Most like this…
Great post you have here. I am hoping to hear more from you especially about ducts or duct cleaning. It gives me more ideas. Keep it up!
Interesting to read about your two cases. Interesting, also, to read that you, as an architect, don’t question why the first case ended up opting for an SSD disk.
That is more what I would have liked to hear about. Your conclusions as to why that customer had to opt for an SSD. And the fact that their system got phased out isn’t interesting at all. That happens but perhaps it happened because their design was so poor they had to have an SSD to handle the load?
That would have been an architectural issue to talk about. The process that led the to their current implementation.
I just recently improved a leaderboard application by 40 times and that’s not even the end of it. Wanna know what made it so slow in the first place?
The scheduler issue… Well…
See, first of all you actually have to forget about all this modern lore about “agile” and equating thinking with the horrible waterfall model and other IT-related stuff and actually reconcile yourself with the fact that this profession, being an architect, is about problem identification, analysis, problematization , reduction, developing a solution and THEN realising it. And, yes, you read that right! It’s not about mindlessly hacking code. If you haven’t done the first five, what problem is your code solving?
Some things have to be thought through from the start or they will cost you, or – in this case n Indonesians, lots of good money.
Thinking ahead a bit and using the little grey cells doesn’t represent such a huge cost until it has to be done afterwards. Wonder why that is such a hard lesson to learn.
Yet another architectural issues I read nothing at all about.
Maybe you are indeed a duct tape architect after all.
Per,
The reason the company opted for an SSD was because it was a quick fix for them. It was cheap and gave them some more runway until they would inevitable run into the same bottleneck again of course.
I am not sure I agree that the fact that their system was phased out when they merged is irrelevant. They system would never have survived that deal anyway, and the bottom line is that the selected solution proved most cost efficient given the outcome. Changing the strategy would have had zero effect on this, at least in this case.
Note that I did not work or consult at the companies as an architect, but was rather doing what you may call a due diligence of their systems. This is why I cannot provide any information about the process that led to their current implementation, as interesting as it may be though!