Engineering

From Monolith to Modular: A Migration Strategy That Doesn't Kill Delivery

We grew from 8 to 35 engineers and watched daily deploys slow to weekly. Here's how we broke the monolith apart without freezing the roadmap or betting the company on a rewrite.

NevkaSystems TeamEngineering

June 18, 2026  ·  11 min read

TL;DR

Strangle the monolith one bounded context at a time with parallel old/new paths, feature-flagged ramps, and instant rollback, and stop at a modular monolith unless the costs of a service clearly pay off.

Key takeaways

1Skip the big-bang rewrite: 52% get cancelled, 90% run long, and a config slip cost Knight Capital $440M in 45 minutes.

2Use the strangler pattern: route a fraction of traffic to a new service, compare against the old path, and ramp 10% to 100% with instant rollback.

3Extract at the edges first (notifications, search, reporting) and leave auth and core logic for last; pick modules with low coupling and their own data.

4Plan for ~5-7 weeks per service: shadow mode, dual-write, backfill, gradual reads then writes, cutover. Data is the hard part, not code.

5A modular monolith is the right stopping point under ~20 engineers; we extracted six services and kept the rest.

Our monolith wasn't slow because of the code. It was slow because of the people around it. Eight engineers became thirty-five in eighteen months, and deploy frequency went the wrong way: from daily to weekly. One codebase, five teams, and every release turned into a negotiation.

You can feel the symptoms before you can name them. Team A is ready to ship and Team B just broke an unrelated module, so the deploy is blocked and everyone waits. Background jobs hammer the database, but you can't scale them on their own because the whole monolith scales as one lump. PRs sit for days because three teams have to sign off on a change that touches their corner. You're pinned to Node 14 because the upgrade breaks something nobody has time to find, and the team that wants Python for ML pipelines simply can't have it. CI takes 45 minutes, so people context-switch, and the context switch costs more than the wait. We hit every one of these.

What we didn't do is stop shipping. No six-month migration on the roadmap, no feature freeze, no rewrite. We pulled the monolith apart while still releasing features every week.

The rewrite is the trap

"Let's just rewrite it properly this time" is one of the most expensive sentences in software. Six months of rewrite is six months of zero new features while your competitors ship. Scope creeps the moment someone says "while we're in here." You can't add features to the old system because it's legacy, and you can't add them to the new one because it isn't done, so the product stalls. Then launch day arrives and everything has to work at once, with no gradual rollout and no escape hatch. Meanwhile the people doing months of invisible work start to leave.

The numbers back this up. Across 200 rewrite projects, 90% ran long (2.3x on average), 67% blew the budget, and 52% were cancelled before they finished. Of the ones that shipped, 31% performed worse than what they replaced. And when a rewrite goes wrong, it can go catastrophically wrong: in 2012 a config error left Knight Capital's old trading code running next to the new code for 45 minutes, and they lost $440 million and nearly the company.

So we used the strangler pattern instead. New work goes to new services, old code stays until something replaces it, and users never notice the seams. It's named after the fig that grows around a host tree and slowly takes its place. Netflix, Amazon, and Shopify migrated this way. So did we.

How the strangler pattern actually runs

Put a routing layer (proxy or API gateway) in front of the monolith so you can decide, per request, where traffic goes. Pick one bounded context, build it as a new service, and leave the monolith untouched. Then run both: send 10% of traffic to the new service, compare results against the old path, and walk it up only when the numbers hold.

· Gradual: 10% → 25% → 50% → 100%, watching metrics at every step.

· Reversible: you can roll back at any percentage, instantly.

· Measured: old and new run side by side so you compare, not guess.

· User-stable: a given user always gets the same path, so behavior is consistent.

Our notifications service is the textbook run. Week 1, new service built at 0% traffic. Week 2, 10% with monitoring. Week 3, we found a latency problem and fixed it before it mattered. Week 4, 50%. Week 5, 100%. Week 6, we deleted the old code. Six weeks, zero user disruption.

Pick the right module first

Not everything should be extracted, and the order matters more than people expect. A bounded context is just a cohesive chunk of the domain: user management, notifications, billing, reporting. Find them by looking for the seams your code already has, then draw a dependency graph. Modules with few incoming dependencies are easy to pull out. Notifications depends only on Core, so it goes early. Auth is depended on by everything, so it goes last, if ever. Your database tells the same story: tables joined by foreign keys usually belong to the same domain. And your product managers carry a mental model of the system that often maps cleanly onto these boundaries, so ask them.

Start at the edges. Notifications, file uploads, reporting, and search are low-risk and often async. Authentication, core business logic, and shared utilities are high-risk and touch everything, so they wait. A good first candidate has low coupling, a clear in-or-out boundary, its own data, independent deployability, and a failure mode the rest of the app can survive. Notifications hit all five, which is exactly why we started there.

The per-service playbook

For each service we run the same sequence. Start in shadow mode: the new service writes alongside the old one but serves no traffic, so you validate it against real data at zero risk. If it owns data, dual-write to both systems and backfill the history before you let anything read from the new side. Then route reads gradually, and only once reads are solid do you route writes. Full cutover is the easy part by then: stop the dual-writes, delete the old code, and drop the old tables after a backup. Budget roughly 1-2 weeks for shadow mode, a week for dual-write and backfill, 2-3 weeks for the rollout, and a day for cutover. Call it 5-7 weeks per service.

Data is where it gets hard

Splitting code is easy next to splitting data, and there's no single right answer. Keeping a shared database is the cheapest start: no migration, ACID transactions still work, and it buys you time to plan real separation, at the cost of not being true independence and schema changes that ripple across services. Separate databases talking over events give you genuine independence and per-service scaling, but you pay in eventual consistency, retry logic, and distributed transactions that are genuinely hard. For most services, separate-database-per-service is the target you're aiming at, with a shared database as the bridge to get there.

The moment you split databases you lose cross-database joins, and you have to decide how to live without them. Sometimes it's a synchronous API call to the owning service. Sometimes it's denormalizing the few fields you actually need. For read-heavy paths that genuinely need joins, build a separate read model with CQRS and feed it from events. Whatever you pick, map your foreign keys first, build and test the event publishers and consumers, rehearse the eventual-consistency edge cases, keep the old joins working as a rollback path, and watch sync lag like a hawk.

You're flying blind without observability

You can't run a gradual migration you can't see. We stood up a migration dashboard before moving any real traffic: current rollout percentage, the old-versus-new traffic split, side-by-side latency and error-rate charts, cost comparison, and any discrepancies shadow mode turned up. Alerts fire on the comparisons, not absolute thresholds, so a regression in the new path is loud immediately. And every service has a one-click rollback that's documented in the runbook and tested before we needed it. A rollback you've never run is a hope, not a plan.

Know when to stop

Microservices were never the goal. Developer velocity and system reliability were. A modular monolith, one codebase with hard internal boundaries, gets you most of the way there without the operational tax: modules import only from each other's public api/ directory, no module reaches into another's tables, and any module could become a service later if it has to. If your team is under 20 engineers, deploys are frequent enough, the stack is consistent, and nothing is screaming about scaling, that's probably where you should stay.

You pull a service out when the costs are clearly worth it, not when it feels modern. Each extraction is 4-6 weeks of work, one more thing to monitor and deploy, 10-50ms of added latency per call, and eventual consistency where you used to have immediate. You pay that for independent scaling, independent deploys, free tech choice, and failure isolation. We extracted six services this way (notifications, file processing, reporting, search, webhooks, and background jobs) and deliberately left the rest in the modular monolith. That balance was the right call. Before you extract the next one, make sure your team can actually carry the operational weight, because you build it, you run it.

Want help implementing this?

We help teams design and ship production-grade software in eLearning, fintech, and AI. Let's talk about your project.

Book a call

Related articles

Engineering

Scaling Multi-Tenant LMS Architecture: Lessons from 50K Monthly Active Learners

June 18, 2026 · 10 min read

Engineering

CI/CD That Actually Ships: Our Practical Pipeline for Web Apps

June 18, 2026 · 12 min read

Product

How We Ship MVPs in 6 Weeks Without Cutting Corners

June 18, 2026 · 8 min read

← All insights

 Engineering