Building Back-End in 2015 vs now: Chaos Monkey - Part 1

Buck Enkhbat
Jul 4, 2025
3 min read

Back in early 2015, I built and ran a discussion forum website focused entirely on finance and stock trends. It was basically Reddit meets Wall Street, minus the memes (at first). To my surprise, within the first seven months, it gained a modest but active user base—way more people were into dissecting candlestick charts at 2 a.m. than I ever imagined.

The first big challenge? Scaling the backend to survive the flood of requests. This was my first real encounter with what I now fondly call “people using my crap website??”

At the time, I was using a classic combo: PHP and MySQL for the backend, with a sprinkling of JavaScript on the front end—just enough to make things slightly dynamic and mostly broken. I hosted everything on DigitalOcean, which offered flexible droplet plans that felt like LEGO blocks for developers. It was also my first time deploying to the cloud. Setting it up was a breeze compare to traditional hosting. Keeping it from turning into a tangled mess of SSH sessions and midnight panic? Not so much.

Below is my very first attempt at live architecture.

It started out simple—just a single load balancer instance with individual app instances. But every time something changed in the data, replication became a nightmare. So I threw in a basic replication setup and added caching. Instant band-aid.

Surprisingly, it worked. Whenever traffic spiked, I spun up more droplets. When things were quiet, I scaled back. It felt like I was manually breathing life into the servers with each new user session.

But as traffic grew month after month, my one heroic load balancer started wheezing under pressure. Eventually, it choked and gave up altogether. Naturally, I slapped another load balancer in place and hoped for the best. And for a while, it worked. But then, things settled into a comfortable rhythm. Growth plateaued, the architecture was holding it good, and I finally had a moment to catch my breath.

That’s when the curious, slightly evil part of my brain kicked in.

The site didn’t handle any sensitive information—no credit cards, no addresses, not even a birthday. Just raw, unfiltered opinions on stock trends, conspiracy theories about the Fed, and the occasional full-blown Tesla bull vs bear battle royale. It was chaos... but beautiful chaos.

So with an unhealthy obsession with uptime metrics—I had a thought: What if I broke my own system on purpose? Not out of malice, of course, but as an experiment. If I could simulate disaster, maybe I could find the weak points before they found me.

So I did.

I called it Domestic Terror—not the best naming in hindsight, but it captured the spirit: a rogue script that randomly killed servers, throttled services, and pulled the rug out from under my app... All while I watched, wide-eyed, like a mad scientist waiting to see if the monster would survive the lightning bolt. To my delight—and sometimes horror—it worked. Sometimes the site kept running. Sometimes it went up in flames. But every time, I learned something. It was crude, hacky, and utterly terrifying. But it taught me more about resilience than any blog post or tutorial ever could.

Later, Netflix shared their own chaos-testing tool with the world—Chaos Monkey—and I remember reading about it and thinking, Well, obviously... I’m not the only one who thought this was a good idea. It was oddly validating. If Netflix was doing it, and doing it openly, then this wasn’t just some rogue experiment—it was a legitimate strategy. Intentional failure, controlled destruction. Break it before the world breaks it.

That’s when I started building out more structured chaos tools—baby versions of what we now call chaos engineering. I automated failovers, faked disk failures, and even introduced random network latency between services. Each test peeled back another layer of assumptions and laziness I didn’t know was lurking in my code.

Story continues:

Comments