A decade ago, microservices burst onto the scene, promising to revolutionize software development and streamline complex systems. Yet, as we look back on their impact, I believe the promised benefits, even when materialized, have led to a tremendous increase in complexity and costs.
In this blog post, we'll examine how the initial allure of simplicity and modularity has given way to added complexities and expenses. We'll also delve into the factors that have contributed to these unintended consequences. In the current economic climate, where we are asked to do "more with less," I'll dive into the real impact of microservices in today's tech world!
I started hearing about Microservices at the end of my tenure at Now TV (~2013) during a pitch from a Cloud platform provider. Now TV was built with a service architecture from the start, and despite a relatively large team (the API teams were around 20 engineers), our services were chunky. As far as I can remember, there were three:
- Accounts: This was an API that would return all the information specific to a certain account, such as watch list, subscription details, and much more.
- Content Catalogue (Movies, TV shows, Sports events): This was an API that would return all the static content from the catalogue plus searching capabilities.
- Content Ingestor: This was the software that would allow content editing and push changes to the Catalogue.
The relatively simple architecture allowed us to refactor our code easily to adjust to a better design. For example, while we started with Grails classic architecture (DB driven with poor separation among domain logic and persistence logic, a bit of a classic for most Grails/Rails/Django applications), we incrementally converged toward a Hexagonal architecture (using the concept of Port and Adapters pattern to decouple domain logic from interaction with external systems).
The way we approached the service split is what is common for service-oriented architectures, looking at non-functional properties of the components involved and understanding which part of the system requires a separate deployable unit to match our overall goals. Some of the system qualities you often take into consideration are: Correctness, Performance, Reliability, Observability, Security, and Scalability.
To give an example, let’s compare the properties of two of the services highlighted above:
- Accounts: All data returned is specific to user accounts. Accuracy of the data is more important than response time (performance). It contains PII and account/payment information (security). If the service is not available, system core functionality is impacted (reliability).
- Content catalogue: All data returned is the same across all customers, allowing for extensive usage of a cache layer (scalability). It requires fast response time (performance). Always returning the most updated information is not required (correctness).
As you can see from the above simplified descriptions, the two parts of the system display very different properties. Probably we could have made everything in one single monolith and still been okay for the first few years of NowTV anyway. At the end, architecture is about tradeoffs, and without massive overhead, our architecture allowed us to reach hundreds of thousands of active customers in the first few years
The Snake oil sales pitch and the Microservices Architecture 10 years ago
Back to the presentation where the concept of Microservice was “sold” to me for the first time, I felt very confused and upset about what the speaker was trying to communicate. The meeting was targeting managers and I was one of the very few engineers in the room. The pitch was something like this:
Are you tired of long development time, big refactoring and tech-debts? Is your codebase hard to change and poorly designed? If you move to microservices, your code would be simpler, better modularized, and easier to change. Your team would be happy, and unicorns would fly down from the sky.(*)
(*) It's possible that I made up the Unicorn part
I honestly thought that unicorns flying down from the sky were the most likely event in that statement. The problem definition was correct, and most managers were nodding: most companies struggle with slow development times, hard-to-change codebases, and the need for large refactoring. It was the conclusion that didn’t make sense at all. Was he saying that having a system made up of a larger number of services (a more distributed system) would make your system simpler and easier to change?
Before I jump into what I was concerned about when I saw the architecture diagram with dozens of microservices sending messages to each other, let’s pause for a second and address the big elephant in the room. Why do most teams struggle with slow development times, hard-to-change codebases, and large refactoring?
Because as an industry, we have been neglecting software design for more than 15 years:
- We don’t study software design anymore, or not enough.
- When we interview, we don’t test software design skills (sometimes I’m embarrassed to ask questions about design. Do I look old school if I ask? LOL).
- We keep swinging between delegating all software design decisions to frameworks and overcomplicated clean architectures.
- We prioritise rewriting instead of learning how to refactor safely
- We hope that our tests will whisper to us what the design should look like, even if we don’t even know when it is most appropriate to use a strategy vs. an observer pattern.
Now that I’ve cleared the air and I feel a bit lighter, let’s go back to the architecture diagram with dozens of microservices I mentioned above. This is what was popping through my mind and making me quite upset (I’m a bit of an Angry Bird with those things):
- If we struggle at building a loosely coupled design inside one service, what chance do we have to create cohesive and loosely coupled services? What’s worse is that it’s much harder to refactor it when we find out our design doesn’t work.
- Why isn’t he mentioning that if the goal is to have loosely coupled services, the deployment pipeline is much more complex and you will need to start thinking about API versioning and backward compatibility for the communication among your services?
- What about testing? Are we expecting that all the services are going to be stable and up and running in all our testing environments? Or do we plan to set up service stubs or something similar?
- Why isn’t the gentleman talking about service-to-service communication and what it means in terms of security and authorization?
- Why isn’t he mentioning that it’s much more complex to triage and trace an issue in production when your system is distributed?
- Why doesn’t he mention that in a distributed system, failure scenarios are exponentially higher and your approach to resilience needs to be drastically rethought?
- Why does he wear a strange red hat? (At that stage, everything was getting annoying. My apologies.)”
Read the list above again. If those aspects don’t make you realize how much more complicated everything is going to be and how much work is required to maintain a similar system, I’d be surprised.
The management brainwash :)
I really didn’t see the point in overcomplicating the architecture in this way until a few years later when I became a manager. Don’t laugh; I’m not saying that if you become a manager, you become lobotomized. Hear me out. I had realized that there was another non-functional property that was a compelling reason to create more services: the creation of fully autonomous teams.
The truth is that you can have multiple teams working autonomously on a single service. For example, see how Shopify deals with it with its modularised monolith.
However having one or more service per team is also a possibility. There are pros and cons with both approaches, and it depends on your context. My bias at the moment is to believe that at a large organizational scale, a service-oriented architecture where services are not only split for the non-functional properties described above but also to allow teams to act independently is an important tool in an engineering leader’s arsenal.
Overall, it's clear that microservices are not the panacea that many once thought they were. If you're considering microservices solely as a solution to issues like slow development times and hard-to-change codebases, you may be barking up the wrong tree. Instead, we recommend prioritizing design and refactoring skills in your hiring and training efforts, while ensuring your team has the necessary capacity to improve your codebase.
However, if you're exploring microservices to foster team independence and encourage better DevOps practices, stay tuned for the second part of this article on on Regret-free Service Migration. Keep an eye on this space for more insights!