Tom Barber (00:00)
If I had a dollar for every time a startup told me they needed microservices, I'd have enough money to fund their inevitable rewrite back to them on Lyft two years later.
I have a severe disliking for the terminology microservices because it constructs a certain pattern in people's minds as to how they're supposed to structure things and where best the interfaces actually are.
So in today's episode, we're going to dig into it. We're going to have a look at what microservices actually are, how to best serve them for your organization. And so let's get into it.
Welcome back. This is episode five. And today we're getting uncomfortably specific about the architecture decisions that midsize companies actually need to make. And so we're not going to talk about some general principles. We're going to talk about exactly when you need microservices and when you don't and how to know the difference. Because
What nobody tells you is that when you're trying to make these decisions, the discourse around the architecture is completely dominated by companies that look nothing like yours. Google, Netflix, Amazon, they have thousands of engineers, dedicated platform teams and problems you literally cannot have at your scale. So when they say, this is how we solved it, they're not lying, but they're also not talking to you.
that it's like taking marriage advice from someone in a polyamorous relationship with five partners. It might be working great for them, but the dynamics are fundamentally different from what you're dealing with. And so today, I'm gonna give you a framework, an actual decision framework for architecture choices. We're covering when you actually need microservices versus when you just want them.
why microservices is possibly the worst named architectural platinum in history and the spectrum of options in between big ball of mud and distributed systems nightmare. And most importantly, specific concrete signals that tell you when it's time to evolve. So let's get going.
So first of all, I want to talk about the microservices and the context of the word itself and why it's probably sabotaging your architecture decisions. so.
The word microservices has damaged the entire generation of architectural thinking. And so like, here's why. When I started working in IT about 6,000 years ago, everything was just a big monolith and we built a single platform and everything was compiled into the same thing. Occasionally we'd be network calls to other services, soap. Everyone loves a bit of soap. But like by and large, like it was one big
product platform, depending on what you're trying to When you say the word micro, what does it actually make you think about? It makes you think about small, minimal, tiny. The word literally suggests that you should be optimizing for smallness. So what do your developers do? They chase absolute minimization. It was a bit like when Scala, for anyone who comes from a programming background, when Scala first came onto the scene, one of my favorite things to do was to see how much you could
take of a Java code and turn it into Scala code and see how much smaller and more compact you can make it. Of course, the trade off for that, is readability. And chasing absolute minimization for microservices is very similar. And so you watch teams agonize over things like, should the user profile service handle authentication, or should that be separate? Or is this service doing too much if it handles both reading and writing?
And these are the wrong questions entirely. The right question though isn't how small can I make this? The right question is like, what's the appropriate boundary for the capability given my team structure, my deployment needs, my data model and my operational maturity. That's a mouthful and it obviously can, yeah, considers multiple different aspects of where you are currently in your business, but it's like,
It's probably why we ended up with microservices in the first place, but that has the branding consequences that come with it.
So like, for example, I've worked with a team a little while ago that would create a microservice for CRUD operations for user addresses. And that's it. All it did was read, write, and update addresses, had its own database, its own deployment pipeline, its own logging infrastructure, its own effectively encore rotation. And I asked them, like, why is this service separate? And they said, well, it's a microservice thing and it should only do one thing. OK, but what?
problem does this separation solve? And the sort of comment that came back was more like, isn't this just how you're supposed to do it? And the answer of course is no. The thing is everyone in more modern times has come with Docker containers, Kubernetes, deployment platforms that allow for easy interconnection. But what this means is you then have a very like... ⁓
rigid network boundary that you also then have to work around. And so this is what I meant about the naming problem though. They were chasing the word micro for its own sake, not because it solved an actual problem. And I have this conversation with a lot of people that I work with over time and microservices should have been called team scale services, independently deployable capabilities, business bounded services or self-sufficient subsystems. Any of those names would have pointed you
toward the actual principle, which is services that should be big enough to be owned by a team, small enough to be understood by that team and bounded by business capabilities, not technical ones.
So let me give you a bit of a pattern that I see various places. LinkedIn is a good one for this. A team splits their monolith into services based on technical layers and they end up with a user service, an order service, a payment service, a notification service, an email service, a PDF service and an analytics service. And so this looks pretty clean from a microservices perspective. Looks like a good separation of concerns except watch what happens when you try and implement a feature. Let's say,
A user completes a purchase, send them an email with a PDF receipt and login analytics event. Now to implement that, you're going to need to hit the order service to create the order, which calls the payment service to process the payment, which then calls the user service to get the details, then triggers PDF service, generate the receipt, which then sends that email, sends that receipt to the email service.
which then needs the user email from the user service again and overwrite, don't forget to call the analytics service to log in analytics event. And you've created a distributed transaction at span seven service. And though if anyone knows steps fail, you need distributed rollback logic or eventual consistency with compensating transactions. And so you've turned what would be a 50 line function, know, do this, this, this, and this call some other functions to do the thing into like a distributed systems PhD thesis.
And the problem is, course, is all those services are still coupled. You just move the coupling from code dependencies to network dependencies. That's my point about Kubernetes platforms. You're just moving that coupling around. You've built a distributed monolith. It has all the downsides of monolith, things like tight coupling.
coordinated deploys and synchronized releases, plus all the downsides of actually having the distribution, things like network failures and service discovery and distributed debugging and eventual consistency. And this is what happens when you chase micro without asking why.
The fundamental insight that microservices community refuse to say clearly enough is that services aren't about being small. And this is my point. They're about being independently deployable by independent teams. And this is where the micro part of the word is a distraction. It's the services that matter. The fact that some of the most successful based service oriented architectures I've seen of what you might call macro services. And these are big meaty services.
Each one is a complete vertical slice with UI, business logic, data storage, everything, but each one is owned by a team. Each one can be deployed without coordinating with anyone else. And that's what we're actually optimizing for independent deployability, team autonomy, if anyone listened to the previous podcast and reduced coordination costs. So from here on out, when I say services,
I mean micro, I mean appropriately scoped for independent operation. Which might be micro, that's fine. Or it might not be, and that depends on your context.
All right, let's get practical. And here's a framework I use when advising companies on architectural decisions. Five factors. And if you evaluate these honestly, you'll know which architecture you need. Factor one, team size and structure. And this is the most important factor when it comes to like figuring out if you're gonna build microservices, because it's the one that everyone ignores when they believe...
Architecture is about technical elegance and it's not about humans coordinating to ship software. Previous podcasts, we were talking about the MVP and MVP engineering teams, all that type of stuff. And this is exactly the same. What we care about is shipping the software, not necessarily how it's constructed.
Here's my rule of thumb. And I've seen this hold across probably 40 plus companies. If you've got one to five engineers, build a monolith. There is absolutely no discussion. It's not even a debate. One application, one database, one deploy. It keeps it simple. Everyone knows where the code is. Everyone knows how to get that thing shipped. Doesn't require an infinite amount of testing or crazy amount of setup. You just ship it. If you've got
five to engineers, start thinking about a modular monolith where you've got strong internal boundaries, but it's still one deployable artifact. Maybe one or two truly independent services for specific reasons that we'll get into. But that gives you the sort of constructs and the interfaces that you can start thinking about when it comes to splitting these things out. If you've got 15 to 30 engineers, then you can end up with multiple services. But we're talking like three to six.
not 30. Each service is owned by a team. Services map to team boundaries. Then when you've got 30 plus engineers. Okay. Now you can start thinking about a fuller service oriented architecture, but you should still be measuring in services, services and teams and not functions. Of course, why does this matter so much? Conway's law. Hands up if you've heard of Conway's law.
I'm sure you have where organizations ship their org chart. It's not just an observation. It's like the gravity of where this thing goes to. Your architecture will mirror your communication structure, whether you plan for it or not. If you have eight engineers and 15 services, every single deploy would require coordination across the entire team because in an eight person team, everyone knows everything anyway. You're in the same Slack channel. You sit in the same room.
The services aren't giving you independence, they're creating artificial coordination overhead. So you've destroyed one benefit of services, the independent deployment while keeping all the costs. Now let me give you a counter example. I worked with a company that had a large amount of engineers and they had a monolith, a big one, and they were in pain, but not for the reasons that you may think. The pain wasn't that the monolith was a big ball of mud.
The code is actually pretty clean. The pain was that they had four distinct product teams and every deploy required getting all four teams to coordinate because they all deployed together. And so you have one team that wants to ship a small tweak. They had to wait for the other team to finish their feature. They had to wait for another team to fix their bug and everything was serialized. And so you deploy frequency drops because instead of going from multiple times a day, go to once a week if you're lucky, when you can get everyone aligned.
And this is where you split those services, not because the code is messy, code can be pristine, but because the teams need to move independently. And so they split into four services and those teams for those teams. And so the deploy frequency can then go back up because your autonomy is increased. And it's not for code cleanliness, as I just mentioned, they were doing it for organizational throughput. And that is a very good reason to start splitting these things up. If you've got the scale and the...
staff to do it.
So the factor two in this list of stuff is like deployment frequency and blast radius. And so having just touched upon it, how often do different parts of your system need to change and what's at risk when they do?
Are there parts of your system that change 10 times a day and other parts that only change once a month? Or does everything always change together? Now, depending on what you're building and how you built it, it's a very legitimate question. And so here's a sign that you might not need the services if you deploy everything together. Anyway, I see this all the time where your team's got a whole bunch of services. Every release, they deploy all 12 services together.
Same version number across the board, same release notes, everything ships together. And that's just a monolith with a bunch of extra steps. You've just added network hops and operational complexity without getting any benefit. And so you should seriously consider merging them back together. But on the other hand, let's say you have a payment processing system that changes once a quarter because it's compliance heavy and scary.
But then you have a recommendations engine that ships updates twice a day because you're constantly tweaking the algorithm. Now deploying those together means that you ship recommendations once a quarter, which probably kills your iteration speed, or you ship payments twice a day, which is terrifying and probably breaks compliance procedures. And that's when the separation makes sense. Different deployment cadences, different risk profiles, genuinely independent life cycles.
And it's the same thing with blast radius. If you have a ⁓ UI experiment system where you're trying random stuff, sometimes it breaks. You don't want that breaking your payment processing, isolate them. But if every change requires touching five services anyway, isolation isn't buying you anything.
Moving on to factor three, which is data ownership and consistency. And this is where an awful lot of microservices architectures fall apart. And it's the ones that kills the teams. Like the hard question you have to answer honestly is can you actually partition your data? was everything related to everything. Because in a monolith, you've got ACID transactions. If something fails, you roll back. Everything's consistent and you can join across tables freely. You can enforce foreign key constraints.
In services, each service owns its own data. You can't join across services. You can't have transactions across services. Or if you try, you're building distributed transactions, which is a special kind of hell that I do not recommend. So before you split services, you need to ask, can I partition this data cleanly?
So let's say you're building an e-commerce platform and you might think, well, users go to the user service, products go to the product service, orders go to the order service. Sounds good. Except when a user places an order, you need to check the product inventory in the product service. Verify that the user payment method in the user service, create the order in the order service, decrement the inventory back in the product service and charge the card.
which might be in the user service or maybe a payment service. That's a distributed transaction. If any step fails, you need to compensate the previous steps. User got charged, but inventory didn't decrement. Well, now you need a background job to fix it. Or you accept eventual consistency and sometimes users get charged products that are out of stock, which you clean up later.
Maybe that's acceptable in your business. Maybe it's not, but you need to know this going in. The alternative of course is keep orders, users and products in one service with one database and you get atomic transactions. Features take a couple of hours to build and not a couple of weeks. Now, can you partition the data cleanly? Sometimes yes. You think about like image processing, users upload images, you resize them.
optimize them, serve them by a CDN. That's a great service boundary because the image data is independent. You don't need to join it with user profiles. It's asynchronous by nature. You can accept eventual consistency. It has different scaling characteristics. It's CPU heavy and can run on different infrastructure. And the interface is clean. Just here's an image, give me back a URL. And that's a legitimate service boundary. The data can be partitioned.
The consistency requirements are loose and the interface is narrow. But if you're trying to split services and you find yourself constantly needing to query across service boundaries or you're implementing distributed transactions or you're accepting consistency anomalies that break user experience, you're fighting against the grain of your data model. So listen to what the data is telling you.
Factor four, independent scalability needs. do different parts of your system need to scale differently? This is often cited as a reason for microservices, but it's usually not as compelling as people think. Here's what doesn't count as a reason. Well, service A gets more traffic than service B. Okay, scale your monolith. Run more instances of it. Horizontally scale, put a load balancer in front. You don't need services for that.
But here's what does count, fundamentally different scaling characteristics. So if you have an API that serves web requests, it's memory intensive and needs, you know, four gigs of RAM per instance, scales mostly with concurrent connections. But you also go back to the earlier example, have a video transcoding pipeline that's CPU intensive. I don't know, it needs 32 cores and barely needs any RAM, but scales with job queue depth.
Running those together means you're over-provisioning RAM for transcoding workers while under-provisioning CPU for API systems. And that of course is wasteful. And more importantly, they have totally different scaling triggers. API scales with user traffic, transcoding scales with video uploads, which might spike at totally different times. And so that's a legitimate reason to separate them. Different infrastructure needs different scaling patterns, different cost profiles.
But if your entire app scales together, which is true for most apps until they're pretty large and splitting them for scalability is premature optimization.
Factor five, do you actually need different technology stacks? There is a legitimate, there are completely legitimate reasons. Like you have machine learning pipeline that really needs Python because that's what most data science stuff is done in. So that's fine. But then you've got your core APIs in Go because you need performance. You're integrating with a legacy system that's in Java that's not going anywhere. And you have a real time component benefits from Rust's performance. And there is a real.
But here's not a real reason, which I've heard often enough, which is engineers want to learn new things. And that's what side projects are for. Language X is theoretically better for use case Y. Unless the difference is massive, the cost isn't worth it.
Every additional technology stack means different debugging tools and practices, different deployment pipelines, different monitoring and observability approaches, harder on-call rotation because your on-call engineer needs to know multiple stacks, longer ramp up time for new hires, more security patches to track across more ecosystems, and more library vulnerabilities to monitor. And the cost is real and it's ongoing. So you're signing up for it forever.
make sure you're getting something that's worth the cost.
Okay, so I told you we had some plans for how to move this forward. So you've evaluated those five factors and maybe you're thinking, all right, I see the case for a monolith, but our monolith is also kind of a mess. So what do I do? And this is where everyone gets stuck in the false binary thinking. They think that choices are A, big ball of mud monolith where everything is coupled to everything or B, microservices where everything is distributed.
but they are not your only options. There's an entire spectrum in between and frankly, the middle ground is where most successful companies live. Option one, the modular monolith. And this is the sweet spot for most mid-sized teams. Here's what it is. One deployable artifacts, one database, but strong internal module boundaries. Clear interfaces between modules, modules.
don't share data structures directly and you enforce these boundaries with tooling. So imagine you have modules for users orders inventory and notifications, our standard microservices, but we're dealing with a modular monadf. They're all in the same code base, same repo, same deploy, but the orders module calls users module through a defined interface only. There's no reaching into another modules database tables.
no importing internal classes from other modules and dependencies that go in one direction only. It means you get a simple deployment with one artifact and one pipeline, easy local development where you just check out and run normal transactions and data consistency, no distributed systems complexity, but you maintain good boundaries that could become services later if needed.
The key of course is enforcing those boundaries. And there's many different tools in different programming languages that can help you implement those boundaries and then also put it into your CI tool. So your CI starts to fail if someone violates module boundaries. I worked with a team recently, did this well and they had about 20 engineers, one Rails monolith but organized into about seven modules using Rails engines.
Each module had its own bounded context, its own database schema namespace, its own API service. And so when they eventually needed to extract the payment processing module into a service for compliance reasons, it took them a couple of weeks. Because the boundary was already clean, they just moved it to different process and added an HTTP interface instead of in-memory method calls. And that's the path. Build good boundaries in your monolith first.
then the extraction becomes straightforward if you need it.
The next is evolution. So you have a core monolith handling most of your features, but you extract two to five services for specific strategic reasons. Maybe payment processing because you need PCI compliance, isolation, image processing because you want scaling to be totally different. So we've discussed earlier. Third party integrations because you want failure isolation or real time features because they need different infrastructure.
Now each extraction is deliberate, which solves a specific problem. I'll tell you when to extract a service, when you can check all of these boxes. Clear business boundary that makes sense to your team, independent deployment of provides measurable value. Team ownership is obvious when you know who owns it. Data can be partitioned cleanly and you have the operational maturity to run it. And that last one is super critical.
Now here's not when to extract. You're just trying to make your code cleaner, refactor it instead. You think it'll be easier to maintain, but it won't. Distribution is harder and the infrastructure to support the distribution is even harder. You want to use a different tech stack? It's probably not worth it.
Option three, and this is my favorite middle ground that no one ever talks about. Instead of thinking about microservices, think about self-contained systems. These are bigger, they're much bigger. But the idea is that each service owns a complete vertical slice and not just an API. The whole thing, like the UI, the business logic, the data storage, it's a complete application. So instead of services like user service, order service, email service, you build systems like
Customer Portal, which is a complete web app from a customer self-service perspective. The checkout system, that's the entire purchase flow, UI and everything. And a merchant dashboard, a full admin interface. Each one can be owned by a team, deployed independently and understood as a complete unit. But they communicate over web boundaries or events with each still being self-sufficient. And this maps really well to how users think about product. Users don't think,
I'm going to use the user service, they think I'm going to check my orders or I'm going to update my payment info. And each one of these is a self-contained system.
So last section, let me give you some signals it's time to evolve your architecture. Not just theoretical principles, but like specific pain points. So signal one is deploy coordination on hell. You know you need to split when you have a Google doc or Slack thread coordinating deploys. Teams ask, is it safe to deploy now? Failed deploys block other teams from shipping. You've created a release manager or deployment czar and you...
your deployment process looks a little bit like air traffic control. And if this is you, it's time. Your teams need independent deployability.
Signal number two is team bottlenecks. If you watch for teams that are constantly blocked waiting for other teams, every pull request touches multiple team areas, code review requires pulling in people from three different teams. There's one person that understands how everything works and everyone's waiting for them and on-call requires understanding the entire system. This means your team boundaries and your code boundaries don't match. And so you need to fix that.
Signal number three is scaling waste. If you're over provisioning 90 % of your system to handle 10 % of the load, paying for 32 gigabytes of RAM on every instance when only one component needs it, and then you're also unable to scale down because everything's coupled together, and seeing your AWS bill grow linearly with users, it's time to split the expensive part out and scale it independently.
The last one of course is the blast radius. And so if you need better isolation, when small changes regularly break unrelated features, you can't touch in the payment code without breaking search. Rollback is all or nothing and everyone's terrified and you've stopped deploying on Fridays or Thursdays or you just barely deploy. Here's what's not a signal though. The code is messy. That's a refactoring problem, not an architecture problem.
The code base is getting big. How big? Shopify ran on a monolith until they were doing billions in gross revenue. GitHub's main app last time I heard, is still a monolith. Big is fine. We want to try microservices. Want is not a need. And someone senior said we should ask them to articulate the specific problem that it solves.
So let's bring this home. The architecture that matters is the one that lets your team ship features at the speed your business needs. Not the one that looks impressive in a conference talk, not the one that Netflix uses. Here's my prescription. Start with a modular monolith. Enforce strong business, strong module boundaries, packaged by feature, not by layer, and use dependency analysis tools to make sure that your CI will fail if someone violates the boundaries.
That way everything stays in the same place and stay there as long as you can, which might be forever. And there are many billion dollar companies running on one lifts. It is fine. When you hit specific pain points, deploy coordination hell, team bottlenecks, scaling waste, blast radius problems, consider extraction. Extract one service and see how it goes and then learn from it. Don't expect extract because you're bored.
or because you read a blog post or because you want to seem sophisticated. Extract because you have a specific problem and service extraction is the best solution.
And please, for the love of everything good in software, stop calling them microservices. They're just services, sized appropriately for your team and your needs. Not micro, not macro, just appropriate.
All right, that's episode five. We'll be back next week with some more insightful content to help your team navigate the wild world of crazy architectural decisions and scaling, how to operate more efficiently in this corner of the ecosphere. Thanks for listening. Go and evaluate those five factors and make an informed decision instead of just doing what's trendy.
If you like this, please like and subscribe, share this with another person in your network and I will see you next time.