From Monolith to Modular: A Migration Strategy That Doesn't Kill Delivery
Migrating from a monolith to modular architecture is risky. Learn how to decompose gradually using the strangler pattern, feature flags, and incremental rollout.
Don't rewrite your monolith from scratch—that's how projects fail. Use the strangler pattern to gradually extract modules while keeping features shipping. Run old and new code in parallel with feature flags, use dual-write for data consistency, and measure everything. Extract one bounded context at a time, starting with the edges. Most teams don't need full microservices—a modular monolith is often enough.
Why We Migrated (And Why You Might Too)
Our monolith was slowing us down. Not the code—the coordination.Signs Your Monolith Needs Decomposition:Deploy Conflicts:Team A wants to deploy their feature. Team B broke something in an unrelated module. Deploy is blocked. Everyone waits. This happens daily.Scaling Bottlenecks:Your API traffic is fine, but background jobs are hammering the database. You can't scale them independently. The entire monolith must scale together.Team Coordination Tax:Five teams share one codebase. Every change requires review from multiple teams. PRs sit for days. Meetings to coordinate releases. Velocity drops.Technology Lock-in:You're stuck on Node.js 14 because upgrading breaks something. One team wants to use Python for ML pipelines. Can't happen in the monolith.Build and Test Times:CI takes 45 minutes. Developers wait. Context switching kills productivity. Tests for unrelated modules must pass before you can deploy.We Hit All Five:Our team grew from 8 to 35 engineers in 18 months. Deploy frequency dropped from daily to weekly. Developer satisfaction tanked. We needed to change.But Here's What We Didn't Do:We didn't stop feature work. We didn't schedule a "6-month migration." We didn't rewrite from scratch.We used the strangler pattern to migrate gradually while shipping features every week.
The Big Bang Rewrite: Don't Do It
"Let's rewrite it properly this time."Famous last words.Why Big Bang Rewrites Fail:Opportunity Cost:Six months of rewrite = six months of zero new features. Competitors ship. You fall behind. Users churn.Scope Creep:"While we're rewriting, let's add that feature we always wanted." Scope doubles. Timeline doubles. Budget triples.Feature Freeze:Can't ship new features to old system (it's "legacy"). Can't ship to new system (not done). Product stalls.Big Bang Risk:Launch day: everything must work. One bug breaks everything. No gradual rollout. No learning. No escape hatch.Staff Burnout:Months of work with no user-visible progress. Morale drops. Key people leave. Project collapses.The Statistics Are Brutal:A study of 200 rewrite projects found:• 90% take longer than estimated (average: 2.3x)• 67% significantly over budget• 52% cancelled before completion• Of those completed, 31% had worse performance than originalReal Example - Knight Capital:In 2012, Knight Capital deployed a rewrite of their trading system. A configuration error caused the old code to run alongside new code. In 45 minutes, they lost $440 million. The company nearly collapsed.What Works Instead:The strangler pattern. Named after strangler figs that gradually grow around and replace host trees.New features go to new services.Old code stays until you replace it.Users see no disruption.This is how Netflix, Amazon, and Shopify migrated. It's how we migrated. It works.
The Strangler Pattern
The strangler pattern lets you migrate incrementally without stopping feature work.How It Works:Step 1: Intercept RequestsAdd a routing layer (proxy, API gateway) between users and your monolith.
1 User Request 2 ↓ 3 [API Gateway/Proxy] 4 ↓ 5 Route to: 6 ├─> New Service(if migrated) 7 └─> Monolith(if not)
Step 2: Extract One ModulePick one bounded context. Build it as a new service. Don't touch the monolith yet.Step 3: Run in ParallelRoute 10% of traffic to the new service. Both old and new code run. Compare results.Step 4: Gradual ShiftIf new service works well:• 10% → 25% → 50% → 100%• Monitor metrics at each step• Roll back if issues appearStep 5: Deprecate Old CodeOnce 100% of traffic goes to new service, delete old code from monolith.Implementation:
1 // API Gateway routing logic
2 async function routeRequest(req: Request): Promise<Response> {
3 const feature = extractFeature(req.path);
4
5 // Check if feature is migrated
6 if (isMigrated(feature)) {
7 const percentage = getMigrationPercentage(feature);
8 const userId = getUserId(req);
9
10 // Consistent hashing for user assignment
11 if (shouldRouteToNewService(userId, percentage)) {
12 return await routeToNewService(req);
13 }
14 }
15
16 // Default: route to monolith
17 return await routeToMonolith(req);
18 }
19
20 function shouldRouteToNewService(
21 userId: string,
22 percentage: number
23 ): boolean {
24 // Consistent hashing ensures same user always gets same routing
25 const hash = hashUserId(userId);
26 return (hash % 100) < percentage;
27 }Key Principles:Gradual: Shift traffic slowly (10% → 25% → 50% → 100%)Reversible: Can roll back at any timeMeasured: Compare metrics continuouslyUser-Stable: Same user always gets same experienceReal Example:We extracted our notifications service:• Week 1: Built new service, 0% traffic• Week 2: 10% traffic, monitoring• Week 3: Found latency issue, fixed it• Week 4: 50% traffic• Week 5: 100% traffic• Week 6: Deleted old codeTotal time: 6 weeks. Zero user disruption.
Identifying Bounded Contexts
Not all code should be extracted. Pick the right modules first.What's a Bounded Context?A bounded context is a logical boundary where a specific domain model applies. In plain English: a cohesive chunk of functionality.Examples:• User Management - auth, profiles, permissions• Notifications - email, SMS, push notifications• Billing - invoices, payments, subscriptions• Reporting - analytics, dashboards, exportsHow to Find Them:1. Look for Natural SeamsWhere does your code already separate concerns?
1 monolith/ 2 ├── src/ 3 │ ├── auth/ # ← Bounded context 4 │ ├── billing/ # ← Bounded context 5 │ ├── notifications/ # ← Bounded context 6 │ └── core/ # ← Not a bounded context(shared by all)
2. Analyze DependenciesDraw a dependency graph. Look for modules with few incoming dependencies.
1 Core 2 / | \ 3 / | \ 4 Auth Billing Notifications
Notifications depends only on Core. Easy to extract.Auth is depended on by everything. Hard to extract.3. Check Database TablesWhich tables belong together?
1 -- User domain 2 users 3 user_profiles 4 user_settings 5 6 -- Billing domain 7 invoices 8 payments 9 subscriptions
If tables have foreign keys between them, they're probably in the same domain.4. Ask Domain ExpertsTalk to product managers. How do they think about the product? Their mental model often reveals natural boundaries.Start at the Edges:Easy to Extract (Do First):• Notifications (send emails, SMS)• File uploads (image processing)• Reporting (read-only analytics)• Search (can be async)Hard to Extract (Do Last):• Authentication (everyone depends on it)• Core business logic (high risk)• Shared utilities (used everywhere)Checklist for a Good Candidate:- [ ] Low coupling (few dependencies on other modules)- [ ] Clear boundary (obvious what's in/out)- [ ] Independently deployable (doesn't break other features)- [ ] Self-contained data (own database tables)- [ ] Optional failure mode (can fail without breaking app)Our first extraction: Notifications. It met all five criteria.
Migration Step-by-Step
Here's our detailed playbook for extracting one module.Step 1: Shadow ModeBuild new service. Don't route traffic yet. Instead, write to both old and new systems. Compare results.
1 async function sendNotification(userId: string, message: string) {
2 // Always send via old system(production)
3 const oldResult = await oldNotificationSystem.send(userId, message);
4
5 // Also send via new system(shadow)
6 try {
7 const newResult = await newNotificationService.send(userId, message);
8
9 // Compare results
10 if (!resultsMatch(oldResult, newResult)) {
11 logDiscrepancy({
12 userId,
13 old: oldResult,
14 new: newResult
15 });
16 }
17 } catch (error) {
18 // New system failed - log but don't affect users
19 logError(error);
20 }
21
22 return oldResult; // Users see old system only
23 }Shadow mode lets you validate the new service with zero risk.Step 2: Dual-Write for Data ConsistencyIf your service has a database, write to both:
1 async function createUser(data: UserData) {
2 // Start transaction
3 await db.transaction(async (tx) => {
4 // Write to old database
5 const user = await tx.users.create(data);
6
7 // Publish event for new service
8 await eventBus.publish('user.created', {
9 userId: user.id,
10 data: data
11 });
12
13 return user;
14 });
15 }
16
17 // New service listens to events
18 eventBus.on('user.created', async (event) => {
19 await newUserService.createUser(event.data);
20 });Step 3: Backfill Historical DataBefore routing reads to new service, backfill old data:
1 async function backfillUsers() {
2 const batchSize = 1000;
3 let offset = 0;
4
5 while (true) {
6 const users = await oldDatabase.users.findMany({
7 skip: offset,
8 take: batchSize
9 });
10
11 if (users.length === 0) break;
12
13 await Promise.all(
14 users.map(user =>
15 newUserService.createUser(user)
16 )
17 );
18
19 offset += batchSize;
20 console.log(`Backfilled ${offset} users`);
21 }
22 }Step 4: Route Reads GraduallyStart routing read traffic to new service:
1 async function getUser(userId: string) {
2 const rolloutPercentage = getFeatureFlag('new-user-service');
3
4 if (shouldUseNewService(userId, rolloutPercentage)) {
5 return await newUserService.getUser(userId);
6 }
7
8 return await oldDatabase.users.findById(userId);
9 }Step 5: Route WritesOnce reads work, route writes:
1 async function updateUser(userId: string, data: UserData) {
2 if (shouldUseNewService(userId, rolloutPercentage)) {
3 return await newUserService.updateUser(userId, data);
4 }
5
6 return await oldDatabase.users.update(userId, data);
7 }Step 6: Full CutoverOnce 100% of traffic goes to new service:1. Stop dual-writes2. Delete old code3. Drop old database tables (after backup)Timeline:• Shadow mode: 1-2 weeks• Dual-write + backfill: 1 week• Gradual rollout: 2-3 weeks• Full cutover: 1 dayTotal: 5-7 weeks per service
Data Migration Strategy
Data is the hard part of migration. Here's how to handle it.Three Approaches:1. Shared Database (Easiest)Keep one database. Services access their own tables.
1 [User Service] ──→ users table 2 [Billing Service] ──→ invoices table 3 ↓ ↓ 4 [Shared Database]
Pros:• No data migration needed• ACID transactions still work• Easy to implementCons:• Not true service independence• Schema changes affect multiple services• Can't scale databases independentlyWhen to use: Early in migration. Buys time to plan proper separation.2. Separate Databases with EventsEach service has its own database. Communicate via events.
1 [User Service] → User DB 2 ↓ 3 Event Bus 4 ↓ 5 [Billing Service] → Billing DB
1 // User service publishes event
2 await userDB.createUser(userData);
3 await eventBus.publish('user.created', {
4 userId: user.id,
5 email: user.email
6 });
7
8 // Billing service subscribes
9 eventBus.on('user.created', async (event) => {
10 await billingDB.createCustomer({
11 userId: event.userId,
12 email: event.email
13 });
14 });Pros:• True service independence• Can scale databases separately• Clear boundariesCons:• Eventual consistency (not immediate)• More complex (event handling, retries)• Distributed transactions are hardWhen to use: Target architecture for most services.3. Database-per-Service with SyncSeparate databases, but keep some data synchronized:
1 // Keep user email in both databases
2 await userDB.updateEmail(userId, newEmail);
3 await syncService.syncToAllServices('user.email.changed', {
4 userId,
5 newEmail
6 });Handling Joins:You lose cross-database joins. Solutions:Option A: API Calls
1 // Get user from user service
2 const user = await userService.getUser(userId);
3
4 // Get orders from order service
5 const orders = await orderService.getUserOrders(userId);
6
7 // Join in application layer
8 return { user, orders };1 // Order service stores user email(denormalized)
2 {
3 orderId: "123",
4 userId: "456",
5 userEmail: "[email protected]", // ← Denormalized
6 items: [...]
7 }Option C: CQRS (Read Models)Separate database for queries that need joins:
1 Write Models(Normalized): 2 User DB 3 Order DB 4 5 Read Model(Denormalized): 6 Analytics DB(has joined data)
Migration Checklist:- [ ] Identify foreign key relationships- [ ] Decide which data to denormalize- [ ] Build event publishing- [ ] Build event consumers- [ ] Test eventual consistency scenarios- [ ] Plan for rollback (keep old joins working)- [ ] Monitor sync lag
Observability During Migration
Migration without observability is blind. You need to see what's happening.Metrics to Track:1. Traffic Distribution
1 Old System: 75% of requests 2 New System: 25% of requests
1 Old System: p50=120ms, p95=450ms, p99=1200ms 2 New System: p50=80ms, p95=350ms, p99=900ms
1 Old System: 0.1% errors 2 New System: 0.3% errors ← ⚠️ Investigate
1 Shadow Mode Discrepancies: 23 in last 24h
1 Old System: \$1,200/month 2 New System: \$800/month 3 Savings: 33%
1 class MigrationObservability {
2 async routeRequest(req: Request): Promise<Response> {
3 const startTime = Date.now();
4 const target = this.selectTarget(req);
5
6 try {
7 const response = await this.execute(target, req);
8
9 // Record success metrics
10 this.metrics.recordLatency(target, Date.now() - startTime);
11 this.metrics.incrementSuccess(target);
12
13 return response;
14 } catch (error) {
15 // Record error metrics
16 this.metrics.incrementError(target);
17 this.metrics.recordErrorType(target, error.type);
18
19 // If new system fails, try old system
20 if (target === 'new' && this.canFallback()) {
21 return await this.execute('old', req);
22 }
23
24 throw error;
25 }
26 }
27 }Dashboard:Build a migration dashboard showing:• Current rollout percentage• Traffic split (old vs new)• Latency comparison chart• Error rate comparison• Cost comparison• Shadow mode discrepanciesAlerts:
1 const alerts = {
2 newSystemErrorRate: {
3 condition: 'new_errors > old_errors * 2',
4 action: 'rollback',
5 severity: 'critical'
6 },
7 newSystemLatency: {
8 condition: 'new_p95 > old_p95 * 1.5',
9 action: 'investigate',
10 severity: 'warning'
11 },
12 dataDiscrepancies: {
13 condition: 'shadow_discrepancies > 100',
14 action: 'pause_rollout',
15 severity: 'warning'
16 }
17 };Rollback Plan:Always have a one-click rollback:
1 // Feature flag controls routing
2 await featureFlags.set('new-user-service', 0); // ← Instant rollback to 0%Document rollback steps in runbook. Test rollback before you need it.
When to Stop
Not everything needs to be a microservice.The Modular Monolith:You can have modularity without services. A well-structured monolith with clear boundaries is often better than poorly-designed microservices.Characteristics of a Good Modular Monolith:
1 monolith/ 2 ├── modules/ 3 │ ├── auth/ 4 │ │ ├── api/ # Public interface 5 │ │ ├── domain/ # Business logic 6 │ │ └── data/ # Database access 7 │ ├── billing/ 8 │ │ ├── api/ 9 │ │ ├── domain/ 10 │ │ └── data/
Rules:• Modules only import from other modules' `api/` directory• No direct database access across modules• Each module could become a service laterWhen a Modular Monolith is Enough:✅ Team size < 20 engineers✅ Deploy frequency acceptable (daily+)✅ No severe scaling bottlenecks✅ Tech stack is consistent✅ No need for independent deploysWhen You Need Services:❌ Deploy conflicts blocking teams❌ Different scaling needs per module❌ Different tech stacks needed❌ Team wants autonomy❌ Module has different SLA requirementsCost-Benefit Analysis:Before Extracting a Service, Calculate:Costs:• Development time: 4-6 weeks per service• Operational complexity: +1 service to monitor/deploy• Network latency: +10-50ms per service call• Data consistency: Eventual instead of immediateBenefits:• Independent scaling: Can scale just this service• Independent deploys: Team can ship without coordination• Tech choice: Can use different language/framework• Failure isolation: If it fails, doesn't bring down everythingOnly extract if benefits >> costs.Our Final Architecture:We extracted 6 services from our monolith:1. Notifications2. File Processing3. Reporting4. Search5. Webhooks6. Background JobsThe rest stayed in the modular monolith. That was the right balance.Key Lesson:Microservices are not the goal. The goal is developer velocity and system reliability. If a modular monolith achieves that, stop migrating.As Amazon's famous quote says: "You build it, you run it." Make sure your team can handle the operational burden before extracting another service.
We help teams design and ship production-grade software in eLearning, fintech, and AI. Let's talk about your project.
Book a call