Microservices don't make sense [part 2]

Nov 11, 2024

I need to make something very clear. Chances are you interpret this as a mere comparison of microservices vs monoliths. It is not. Comparison only serves as a tool. I know that just by providing such comparison I could be legitimizing a bad idea further. Nevertheless it is necessary to arrive at titular conclusion.

Microservices don’t make sense from an engineering point of view. I postulate this term be frowned upon and ridiculed whenever it is brought up during conversation. Tech companies would be best off if they banned using the term in technical discussions.

As an industry we should have never agreed to using a term without strict definition. You cannot disprove that which is not fully defined. Is microservice a service a two-pizza team maintains? Is it anything that is not a monolith? Is it a service that represents operations on one domain object? Or is it an equivalent to a single lambda function? Apologists change these definitions to their liking. This sometimes reaches the level of absurd where, according to certain definitions, a system can be both monolithic and microservice at the same time! I applaud some companies, like Airbnb who purposefully decided not to use this term to describe their distributed architectures and used the well defined term - Service Oriented. I applaud companies like Shopify even more for not giving into the idea in the first place.

For the purpose of this document we will assume the principle of a monolithic style is to collapse the core functionality of a system into a single codebase with a single CI process and as few runtime artifacts as possible with minimal network communication. Microservice style on the other hand will be a highly distributed system attempting to achieve a goal of expressing each atomic business or technical function as an independent unit of code, build and deployment. When we say ‘system’ we mean a business domain, like payments, market, e-commerce, portfolio management or an exchange. We do not mean a bundle of all functions of a business or company or even an org. This way we eliminate a likely strawman from the discourse.

What allowed and will keep allowing bad ideas to thrive is the cyclical nature of software development. People divert the blame for their failures towards bad tools or bad architectures causing a period of undeserved loss of interest in said tools and architectures. For an example of such a cycle with far-reaching consequences, research the term “AI Winter”.

Monolith at a time was also a whipping boy to blame for technical failures. Anything new was better because there was no way to disprove a hypothetical future system. Bad actual monoliths were compared to good hypothetical microservices. The playing field hasn’t been even until now, when some architectural failures or unfulfilled promises became spectacular enough to catch an eye of the court of technical public opinion. Before it was like this:

Good A is better than bad B.

Do you see the problem now that we reduced this argument to an atomic statement? What we should really try to answer is this:

Is good A better than good B?

Is bad A better than bad B?

Shall we find out? What would we consider a good Microservice architecture? I take everything done by the book, perfectly defined service boundaries aligned with domains and subdomains, hygienic APIs, observability, airtight CI/CD pipelines, ephemeral environments and expressive end to end testing framework. A system of dev’s dreams (mine included).

Its counterpart in the monolith world, what would that be? Perfectly defined module boundaries aligned with domains and subdomains, hygienic public methods/functions, observability, yadda, yadda, yadda. To that I would add incremental build, testing and deployment to compensate for the size of the codebase and binaries.

Let us go through some of the important characteristics of software systems and see how the two ideal cases stack up.

Feature implementation

If a feature involves an internal change to one of the components, the complexity is similar for both cases. Just code it. If a feature spans multiple services, separate changes need to be implemented, their testing and launch will require cross-team coordination for MS.

Build and deploy efficiency

A localized change is no different for both cases. A distributed change that involves update to the API will require the following predefined process, which as mentioned before, involves cross-team coordination. It would not be very hard though with properly configured monorepo and client libraries update procedures.

End-to-end test execution

In monoliths, you run the system and run the test suite. In microservices, we identify a minimum subset of services, spin up matching versions and execute the tests. If our dev tools team did a good job, that should not be much different than in a single-service case. Perhaps slightly longer and resource consuming due to network communication.

Performance of transactional calls

Network comes with cost, so no matter how we slice and dice it, in the ideal world, a transaction within a monolithic system will always have similar or better performance than its equivalent in the microservices world. The difference might be slight and let’s remember, monolithic does not exclude distributed communication, network might still be involved to some extent, but never as dominantly as in the MS case.

Data consistency and integrity

We got so used to eventual consistency that nobody is assuming ACID anymore. Patterns like event sourcing are proposed at the application level to compensate for that lack of consistency. We are talking about ideal systems here. Ideal systems have domain models designed in such a way that you only need strong consistency within the domain boundary and eventual consistency outside. Properly implemented MS and monoliths don’t differ much in that regard.

Disaster recovery

Level of complication for recovery tasks is proportional to the complexity of the topology and fragmentation of systems of record, but that effect is tamed with helpful techniques such as rolling updates/restarts, cross-region failover, etc.

Cost of infrastructure

One of the most distinguishing qualities of well built systems over bad ones is the cost of operating them. Ideal system would consume a miniscule fraction of its terrible twin. Regardless of the architecture.

Ideal systems don’t exist. Don’t attach too much weight to the above comparison. Although if you are into math theorem proving, you might find, there is sufficient signal to postulate this:

For every microservices system, there exists an equivalent monolithic system that is more performant.

Now it’s time to come down to earth. Gutter level. There is less need for imagination here as there are plenty of real life abominable systems on both sides of the aisle. Here is a little comparison of the worst cases.

Feature implementation

Terribly slow and brittle. Monoliths will require you to follow gargantuan stacks of calls to figure out where to place the new code. Implementation will always carry a risk of breaking things because certain “action at a distance” effects are simply not testable. Microservices will suffer a different class of problems either we would need to do an investigation about the ownership of the domain of the feature, find out teams and try to arrange the implementation of the feature to land on their backlog or we will go rogue and build yet another microservice with no regards for duplication of responsibilities in the system, because at least this allows us to move forward. Implementation will suffer compatibility problems and, to a certain degree, consistency violations that will be discovered in production and will have to be reconciled manually.

Build and deploy efficiency

The monolith will build painfully slow, multiple hours. We will “only have one shot” a day to build and run a test suite. We will have a narrow window at a monthly or quarterly release window. The deployment will reveal problems, the engineers will rush to fix them with zero regards for the fact they’re degrading the quality even more. While MS builds will be faster per codebase, the gain will be offset by the need to manually manage deployment of a feature across multiple services. Any end-to-end tests in non-production environments will be incomplete and inaccurate, letting bugs into production to be discovered by users.

End-to-end test execution

Let’s be honest, at a certain level of chaos there is no way to have reliable end-to-end testing for any architecture. Whatever we are left with will be flaky, slow and unreliable.

Performance of transactional calls

Microservices will yield a high network call amplification. For every call into the system, there will be tens of network hops, sometimes redundant leading to massive inefficiencies. Monolithic system won’t be much better. The amplification of reads and writes to DB might be as bad if not worse than in distributed design. Monolithic systems will also reach the limits of vertical scale and at some point there will be no way to balance performance by scaling.

Data consistency and integrity

Sheer time it takes to execute requests will force engineers sooner or later to go asynchronous and further weaken whatever remains of consistency guarantees of the system. Requests will be accepted, sometimes they will pass, sometimes they will fail. There will be a multitude of failure configurations with many of them leading to some data inconsistencies or even integrity violations. One module will assume the object exists while another system will expect the same object to be deleted. No case is better.

Disaster recovery

Monolithic system will be a one super-massive failure domain. Any error can lead to catastrophe. MS situations will have some de-facto failure modes, undocumented, perhaps not yet discovered. There will be failures that an MS system will be able to withstand, that monolith couldn’t. RCA for MS systems will be a nightmare, oftentimes abandoned after days of unsuccessful attempts.

Cost of infrastructure

In order to pull its own weight the monolithic system will have to be massively over-provisioned, but it will not come near to what the microservices equivalent needs, where hundreds or even thousands of services will need their own high availability clusters, databases, log aggregation, etc. There will be a limit defined by the largest available virtual machine beyond which monolith (the badly written one) will not be able to scale.

Sorry if you hoped for a clear winner in this. And it makes sense if you think of it. There is no theoretical limit to how bad a system evolution can go, but in practice the buck stops when the cost of maintenance substantially exceeds building a new system from scratch. Or when due to inability to innovate the company goes out of business.

So it all comes down to how easy/hard it is to direct that system evolution towards a better version of itself. And that, my friends, is the exact thing we have given up going down a microservices path. We took the power away from engineers. We diverted attention away from the code and we made trivial matters complicated. Diagramming replaced coding and architects are running the show.

We need to talk about the other direction though. How easy it is to prevent the system from degrading towards its worst case? There is an argument I’ve seen in regards to this. That degradation of microservices is contained within the service boundary, while degradation of monoliths has no limits. It bears merit. Degraded monolith is a spaghetti code where every component talks to any other component without constraints. Its structure collapses. Degraded microservices are not immune to this effect, but because of rigidity of APIs, even a degraded architecture will maintain some level of structure. It used to be more prevalent for monolithic systems before ideas of functional programming and modularity were widely used in mainstream languages like Java and C#. Mutable global state and violating public module boundaries are, to this day, the largest contributors to complexity of codebases. A question arises:

Why would you be concerned more about stopping degradation of your system than ability to improve it?

At the end of the day, we’re not talking about natural processes, like oxidation or rotting. It is the work done by humans with functioning brains and hands on the keyboard. If you trusted the skills of your teams, if you knew they would do the right thing, you would rather worry about unblocking them from doing the right thing ASAP.

Behold. For we have just defined the true purpose of microservice architecture.

It is a solution for engineering organizations with low average engineering skill and low trust from leadership to keep chugging along!

I do not dispute this purpose. It is legitimate. And it delivers as advertised. Companies pay handsome tax having to hire more people, maintain more code and pay for more tools, but they can keep going and scaling.

If you are an engineer in such an organization, consider what it means for you personally:

You are not trusted. Your bosses are more interested in preventing you from screwing up than in allowing you to fix things. You will not be able to do your best as you will not be able to do your worst. You will forever do average.

Are you satisfied with forever average?

Thank you kindly for reading.

Hundred Mondays

Discussion about this post