It is embarrassing how long it took me to formulate this conclusion. The evidence has always been there. A decade of hysteric enthusiasm didn’t help. Critical views popping here and there suffered the same problem as the original proposition - they were based on individuals opinions, anecdotal evidence and loose definitions. I’m not aspiring to turn this discourse into a formal argument, but I will at least attempt to avoid the logical fallacies. To achieve that we have to reduce our definitions to an absolute minimum. Let’s go people!
Let’s start with an example of a simple local function call. I chose Rust notation for its clarity, but all the examples are transferable to any mainstream language.
fn module1(){
let result = module2(Type1::new(), Type2::new());
// do something with the result
}
// somewhere else in the codebase:
fn module2(arg1: Type1, arg2: Type2): ReturnType {
let value = ReturnType::new()
return value;
}
A function module1 calls a function module2. Module1 passes two arguments to module2. Types of these arguments are Type1 and Type2 respectively. Module1 waits until module2 returns, assigns its return value to the result value and continues its execution. There are some language-specific operations on the stack as it happens, but these details are not important.
You are about to close this and read something else, aren’t you? Please don’t. Awkward as it is, these basics have to be stated to illustrate the problem. Sorry.
Let’s make some assumptions:
none of these functions perform remote calls
both functions are executed in the same runtime (be it an interpreter, virtual machine or an operating system)
the language and runtime we use have basic productivity tools, like compiler/static analyzer, debugger and IDE capable of navigation and basic refactoring
Now we will review a bunch of tasks one might undertake with the above code.
Task: Navigate from module1 to module2 code
Use ‘jump to definition’ feature
Task: Change the body of module1 or module2
Code, save, compile, test, release
Task: Remove a field from module2 input arg
Use ‘safe delete’ to remove the field and all its uses throughout the codebase
Task: Change module2 return type
Update the type, and how it is used for all references to module2 with safe refactoring
Task: Break module2 into two separate functions
Rearrange module2 code and its clients. Save, compile, test, release.
Task: Move module2 to be a member of a different class
Use ‘move to’ feature.
Task: Test interaction between module1 and module2
Write a unit test that runs module1 and validates that effects of module2 are as expected.
Task: Debug the execution flow of module1 and into module2
Use either a local or remote debugger with breakpoints and step features.
Those actions - what do they have in common? They are unremarkable. Something a single dev or a pairing duo would execute multiple times a day. Fast, local, atomic, forgiving. Just cheap. With those tenets you can achieve a state of flow. You can do rapid experimentation. You can have confidence to continuously refactor towards the most optimal structure.
You know what comes next, don’t you? There is a reason why we called these functions module1 and module2. Let us then transfer this little trivial system into the world of microservices.
Here Module1 and Module2 are separate services communicating by exchanging messages over the network using some protocol. The protocol defines how messages are packaged, serialized, secured, deduplicated and retried. All that in order to assure some level or reliability.
We will now quickly list steps required for this synchronous exchange of messages to happen. We will assume the common practice of upstream services providing client libraries to their downstream counterparts.
Module1 builds a Request object and calls a method of Module2 client library
Module2 client serializes the Request object
Module2 client uses protocol of choice to send the message over the network
Message is broken into a series of IP packets as it travels over the network
Module2 runtime receives the packets and assembles the Request message
Module2 code deserializes the message into an object
Module2 performs necessary business logic
Module2 builds a Response object
Module2 serializes the response
Module2 sends the response over the network
Message is broken into a series of IP packets as it travels over the network
Module1 runtime receives packets and assembles the Response message
Module2 client library deserializes the message into a Response object
Module1 code receives the response and proceeds to further business logic
This is a simplified model that purposefully skips concerns such as encryption, load balancing and various layers of routing. Out of the 14 steps listed, items 1, 7, 8 and 14 have corresponding steps in the non-microservice solution.
We will now map the activities listed previously into concrete actions for this architecture.
Task: Navigate from module1 to module2 code
Check Module2 client what is the name of the underlying API, then look that API endpoint up in the Module2 codebase either by full text search or with assistance of some custom tools. Might require some application level routing.
Task: Change the body of module1 or module2
Assuming no API changes, it involves change, save, compile/test and release.
Task: Remove a field from module2 input arg
Mark the field as deprecated and release a new version of Module2 client.
Find all downstream services using this API.
Modify Module1 code not to use deprecated fields. Deploy Module1 change.
Change Module2 client and service to remove the field completely. Release a new version.
Modify Module1 to use the new Module2 client and deploy.
(this is one of a few ways to do it. The level of complication of other methods is similar)
Task: Change module2 return type
This could be accomplished by a similar process to that of removing a field. Although in practice we would release a whole new API with the new return type and have it running parallel to the old one. Then incorporate a switch in all downstream services and once the old API is not used anymore, shut it down.
Task: Break module2 into two separate functions
This would require defining two new APIs and following a similar process as the one in ‘change return type’ action.
Task: Move module2 to be a member of a different class
This change is irrelevant in the context of microservices unless the remote access layer follows CORBA-style, or when we’re moving the API to another service. A rare case, but if necessary, we do it by rewriting the code into the codebase of a different service and following client migration procedure.
Task: Test interaction between module1 and module2
Spin up instances of both modules either as separate processes or separate containers.
Alternatively, use a testing framework for one module assuming the availability of the other module under a given url. There are multiple techniques of integration testing of microservices.
Task: Debug the execution flow of module1 and into module2
While it is possible to connect two remote debuggers to two different services, in practice it is rarely used. We mostly rely on debug-level logging and provide enough request metadata to be able to trace the path of a request across network boundaries. With help of some log aggregation tools we can then recreate a scenario. If debug info is not granular enough, we might need to add more logging to code.
A great deal of detail was omitted when describing the actions above. There are plenty of patterns, techniques and tools for achieving those goals for microservices. These are all moderate-to-highly complicated tasks taking any time between hours to weeks to complete. No way we can be confident about anyone’s ability to perform these tasks routinely, multiple times a day.
Is that a problem? - we ask. Is it not just a matter of efficiency? Yes - some refactoring is harder - having to spend more time to restructure the code is an acceptable compromise given all the benefits we get in return. We will talk about those benefits in a moment. If we think that is indeed an acceptable tradeoff, it means we don’t really understand what we are trading off.
Ability to restructure the internals of the system quickly enables experimentation. We can evaluate several approaches within a couple of days, run tests, select the winner based on empirical evidence. If we lose that ability, we are left with two alternatives:
For each decision, consult experts, perhaps do a comparative analysis based on what we know and trust our collective intelligence.
Go with the first solution that comes to mind and accept the risk that comes with it.
Let’s say we want to make the best decision based on what we know. What do we know? Hard to generalize, but one thing can be said with absolute certainty: Of all the moments in the future where we could be making such a decision, this is the moment we know the least. There is no basis then to claim the decision will most probably be right. What if we don’t care? What if we accept the risk of the choice being wrong and roll with it? What if we decide to ‘move fast and break things’? What happens is exactly that. A moment arrives where we have sufficient context to say ‘that decision was not right, here is what we should do’. Except by that time we might have a system architecture established and built. Perhaps used in production. At this point changes like breaking down a single API call into multiple, changing its parameters or consolidating many APIs into one are all architectural changes. They require cross-team agreement, approvals, release coordination, deprecation plan, etc. All the while in the world of monolithic systems this would be a few minutes of work of a single engineer without bothering anyone else. To blow up simple decisions into complex and difficult processes is nonsensical, is ridiculous and frankly speaking is stupid. The only reason we don’t see it clearly is the layers of complexity one needs to shave off to establish the ground facts.
And the more distributed our system is (i.e. more strongly exhibiting traits of microservices ‘architecture’) the more prevalent the problem is. We rob the engineers of an ability to ever get in the state of true flow or being able to see the ‘big picture’ just by looking at the code. The big picture now requires us to look at infrastructure topology and collate it with documentation of a multitude of APIs. There will be tools (expensive tools) that make it manageable for us.
The state of flow is precious to individuals. People change jobs for that reason. What keeps them hooked on that flow-less situation? As engineers we take pride in solving problems and microservice architecture is very compelling because it introduces a lot of micro-problems that we can quickly solve and receive the mental gratification from such accomplishment. Problems like creating a spec for a service container cluster and seeing the cluster operational after a day of hard work. Writing a complicated distributed logging query and finding out where the request got stuck. Coding an application-level equivalent of a three-way join of data from three other services. These problems are hard enough and interesting enough to keep us happy-busy. We do not mind that in the grand scheme of things, the work we do is just toil. As a matter of fact, we don’t know it.
There is this recurring theme dating back to medieval times of a person selling their soul to the devil in exchange for unimaginable riches. Now that we see the tradeoff we make by selecting microservices is as serious as giving away a soul, let us have a closer look at the ‘riches’ we are promised.
I could have written a book just about debunking every single supposed benefit of microservices. But people don’t read books, so I need to be brief. The only point I want to show is that every single benefit is also achievable in monolithic systems, therefore it is not a benefit.
Benefit 1: Teams can work independently on components of the system.
Teams can work independently on components of a monolithic system as well. It is enough to define separate modules or namespaces to avoid name clashes and establish a basic contract in the form of public function or method signatures, which can easily evolve in the future.
Benefit 2: Scalability - each microservice can be independently scaled.
Monolithic systems can also be independently scaled. Ability to have worker pools, resource clusters and grid compute is not exclusive to microservices.
Benefit 3: Better fault isolation - the failure of one service is less likely to cause a catastrophic outage.
The first two benefits are at least true statements. This one is not. It seems we got ourselves two truths and a lie. If service A depends on service B and service B goes down, then service A goes down. Perhaps the idea is to have a high-availability cluster with failover? But then exactly the same can be done with a monolithic system - whip up a high availability cluster and a failover strategy.
We still might be tempted to argue the merit of an idea based on its popularity. Thousands of tech companies doing MS can’t be wrong. We will dismiss this argument as a typical example of ad populum fallacy.
How has such popularity been achieved? That is another interesting conundrum. There is an idea predating MS called Service Oriented Architecture. It is a very well documented, non-scientist-friendly and formally described set of patterns for creating distributed systems. The abuse of SOA by vendors and consultants during the era of “digital transformation” contributed to general skepticism towards it even though the framework itself is very well thought through and relevant to this day. It is possible that on the wave of that skepticism, the new approach has arrived. One that “felt” lightweight. Little did we know, “lightweight” really meant “vague”. It was wrong to admit a vague idea into technical conversations in the first place, but once it had been in, there was no coming back. People like vague ideas because they can act as experts. It is easier to market vague ideas, build tools and businesses around them. It is then important to strip those ideas off of their vagueness to the essential core and discuss its value proposition. As we have already seen, that proposition does not impress.
We will explore this topic further soon. Stay tuned.
I’ve seen this problem in every job I’ve had:
1. Company starts out with a monolith. The monolith is badly managed, the code isn’t modular, the tests are poorly written and take forever to run. Everyone thinks the solution is microservices.
2. Huge effort to break up the monolith that takes years and never really ends.
3. Everyone thinks, Well since it took so long to break up the previous monolith, next time we need to add new functionality, let’s just put it in a microservice to begin with to “save time.” Bonus points for promotions being way easier to achieve when you can say you built a new service.
4. Often the abstractions and boundaries between the new microservices are poor (because, as you pointed out, no one can predict the future)
4a. Also, sometimes the migrations from a monolith to a collection of services don’t fully complete because of priority shifts!
5. Small refactors that would be have been completely trivial in a monolith now take weeks or are just impossible. Teams move super slowly. Teams use complicated distributed systems patterns to achieve goals that would be trivial with simple, readable code in a monolith. Unit tests become integration tests. Instead of being able to manually test a change by spinning up a simple monolith in your laptop or in a cloud dev env, you now need to spin up dev/staging versions of multiple services to test changes. Staging becomes unstable.
6. As a bonus, the operational load becomes higher with every service, debugging becomes more difficult, etc. And God help you if you work at a company with multiple backend languages and you end up with functionality that should be all in one service spread across multiple microservices with poorly-defined domain models and boundaries, each of which is in a different language.
I’ve seen this happen at multiple jobs and it is really sad.
This DHH article from 2016 made me question the wisdom of microservices a long time ago. I wish I had pushed back harder against them in multiple situations: https://signalvnoise.com/svn3/the-majestic-monolith/