The Problem It Solves
In a monolithic application, a database transaction spanning multiple tables is atomic — either all changes commit or none do. In a microservices architecture, each service owns its own database. No single transaction can span service boundaries. A business operation that must update Orders, Inventory, and Payments becomes three separate local transactions in three separate databases.
Without the Saga pattern, a failure after updating Orders but before updating Inventory leaves the system in an inconsistent state with no automated recovery path.
Pattern Structure
Two styles implement the saga: choreography and orchestration. The right choice depends on the complexity of the workflow and the team's operational preferences.
Choreography — Services React to Events
Each service publishes an event after completing its local transaction. Other services listen and react. No central coordinator exists. The workflow emerges from the chain of events.
%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'IBM Plex Sans, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([Customer Places Order]) START --> OS[Order Service\nCreate order — PENDING\nPublish OrderCreated event] OS --> IS[Inventory Service\nReserve stock\nPublish StockReserved event] IS --> PS[Payment Service\nCharge customer\nPublish PaymentProcessed event] PS --> OS2[Order Service\nUpdate order — CONFIRMED\nPublish OrderConfirmed event] OS2 --> DONE([Order Complete]) IS --> FAIL1{Stock\nAvailable?} FAIL1 -->|No| COMP1[Publish StockUnavailable\nOrder Service cancels order\nCustomer notified] PS --> FAIL2{Payment\nSucceeded?} FAIL2 -->|No| COMP2[Publish PaymentFailed\nInventory releases reservation\nOrder Service cancels] style START fill:#4f8ef7,color:#fff style DONE fill:#10b981,color:#fff style COMP1 fill:#fef3c7 style COMP2 fill:#fef3c7
Orchestration — Central Coordinator Controls the Workflow
An orchestrator service sends commands to each participant and waits for responses. The orchestrator holds the saga state and decides what happens next including compensating transactions on failure.
%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'IBM Plex Sans, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([Order Request Received]) ORCH[Saga Orchestrator\nHolds saga state\nCoordinates all steps] START --> ORCH ORCH -->|Command: ReserveStock| IS2[Inventory Service] IS2 -->|Reply: StockReserved| ORCH ORCH -->|Command: ProcessPayment| PS2[Payment Service] PS2 -->|Reply: PaymentProcessed| ORCH ORCH -->|Command: ConfirmOrder| OS3[Order Service] OS3 -->|Reply: OrderConfirmed| ORCH ORCH --> DONE2([Saga Complete]) IS2 -->|Reply: StockUnavailable| ORCH PS2 -->|Reply: PaymentFailed| ORCH ORCH -->|Compensate: CancelOrder| COMP_O[Order Service\nCancel and notify] style START fill:#4f8ef7,color:#fff style DONE2 fill:#10b981,color:#fff style ORCH fill:#e0f2fe style COMP_O fill:#fef3c7
When to Use
- Business operations that span multiple services and require all-or-nothing semantics
- Workflows where partial completion leaves the system in a visible inconsistent state
- Long-running processes where a two-phase commit would hold locks for an unacceptable duration
- Systems where each service must remain independently deployable and scalable
When Not to Use
- Operations that can be made idempotent and retried without compensation — prefer simpler retry logic
- High-frequency, low-latency transactions where saga overhead is measurable
- Systems where you control all the services and could share a single database — prefer a database transaction
- Simple two-service interactions — consider the Outbox Pattern instead
Trade-offs
| Benefit | Cost |
|---|---|
| No distributed locking — services remain available | Compensating transactions must be designed and tested for every step |
| Each service uses its own database technology | Eventual consistency means brief windows of observable inconsistency |
| Independent deployability preserved | Debugging failed sagas requires distributed tracing across services |
| Scales horizontally with each service | Orchestrator becomes a single point of failure in the orchestration style |
Implementation Approach
Choreography is preferred when the workflow is simple (three to four steps), the team is comfortable with event-driven debugging, and you want to avoid a central coordinator.
Orchestration is preferred when the workflow has branching logic, compensations are complex, or you need a clear audit trail of saga state. AWS Step Functions, Temporal, and Apache Camel implement the orchestrator role.
Three principles apply to both styles:
-
Compensating transactions must be idempotent. The compensation may be called multiple times due to retries. Releasing a stock reservation twice must be safe.
-
Saga state must be persisted before sending commands. If the orchestrator crashes after sending a command but before recording that it did so, the command will be sent again on recovery — the participant must handle duplicates.
-
Use a correlation ID. Every message in a saga carries the same correlation ID. Observability and debugging depend on being able to reconstruct the full saga from distributed logs.
Anti-Patterns to Avoid
Designing the happy path of the saga but not implementing compensations. Tested in development where failures are rare, discovered in production when payment fails after inventory is already reserved.
Design compensations alongside forward steps. Every step that changes state must have a compensation. Test failure scenarios explicitly in integration tests.
Implementing saga steps as synchronous HTTP calls between services, waiting for each response before proceeding. A slow or unavailable downstream service blocks the entire saga and holds the caller's thread.
Use asynchronous messaging between saga steps. Each service processes its command from a queue, publishes its result to a topic, and returns immediately. The saga progresses asynchronously.
Integration Platform Implementations
- MuleSoft: Implement saga orchestration using MuleSoft flows with Until Successful scopes for retries and error handlers as compensating transaction triggers. See MuleSoft Platform.
- OIC: OIC Process Automation implements choreography-style sagas where each step is a human task or system integration. See OIC Platform.
Cloud-Specific Implementations
- AWS: Implement the orchestration style using AWS Step Functions with Lambda functions as saga participants. The Express Workflow type suits short-lived sagas. For choreography, use EventBridge + SQS.
Flowchart
References
- Richardson, Chris — Microservices Patterns. Manning, 2018. Pattern: Saga
- Garcia-Molina, H. and Salem, K. — Sagas. ACM SIGMOD, 1987.
- AWS — Saga orchestration with Step Functions. docs.aws.amazon.com/step-functions/latest/dg/concepts-saga-pattern
- Temporal — Workflow orchestration. temporal.io/docs