Integration Patterns
Proven blueprints for connecting services without creating hidden dependencies
Imagine a post office: you drop a letter in the box (producer) and the recipient picks it up when ready (consumer). Neither needs to know when the other is available. Integration patterns apply this same idea to software — services communicate reliably without requiring simultaneous availability.
Poor integration is the #1 cause of microservice project failures. Integration patterns reduce system-wide failure blast radius by 60–80%, enable independent team deployment, and are the foundation of every high-availability architecture at companies like Uber, Airbnb, and all major Philippine banks.
📖 Detailed Explanation
Integration is where most distributed system failures are born. A synchronous call chain — Service A calls B, which calls C, which calls D — means that a 100ms delay in D becomes a 400ms delay in A, and a failure in D cascades to take down A, B, and C simultaneously. This is the core problem that integration patterns exist to solve.
Synchronous vs. Asynchronous is the fundamental integration choice. Synchronous (REST, gRPC) is appropriate when the caller needs an immediate answer — a payment authorization, a balance inquiry. Asynchronous (Kafka, SQS, RabbitMQ) is appropriate when the caller does not need an immediate answer — sending a confirmation email, updating a search index, notifying a downstream analytics system. The mistake most teams make is defaulting to synchronous everywhere because it feels simpler. It is only simpler until you need to trace why your p99 latency is 8 seconds.
The Saga Pattern solves the hardest integration problem: distributed transactions. In a monolith, you use a database transaction — either everything commits or nothing does. In a microservices architecture, you have multiple independent databases. If Order Service creates an order and Payment Service charges the card and Inventory Service reserves stock — what happens when Inventory fails after Payment succeeds? Sagas address this via compensating transactions. Each step publishes an event; if a later step fails, compensating events reverse the earlier steps. Uber uses saga orchestration for its trip lifecycle across 20+ services.
The Outbox Pattern solves a subtle but critical problem: how do you guarantee that when you save a record to your database AND publish an event to Kafka, both happen atomically? If the Kafka publish fails after the database commit, your event is lost. The Outbox Pattern writes the event to an outbox table in the same database transaction as the business record, then a separate process (Debezium, a polling publisher) reads the outbox and publishes to Kafka. Exactly-once semantics achieved without distributed transactions.
Event-Carried State Transfer is a pattern where events carry the full data payload needed by consumers — not just an ID. When an Order event carries customer name, shipping address, and line items, consumers can act without calling back to the Order service. This eliminates synchronous dependency chains between services and enables consumers to operate independently even when the Order service is down.
The Claim Check Pattern solves message bus performance: if you're streaming large payloads (PDFs, images, large JSON blobs) through a message bus, you'll saturate it quickly. Instead, store the payload in object storage (S3, Blob Storage) and put a reference token (the "claim check") in the message. Consumers redeem the claim check to fetch the actual payload. The message bus stays lean and fast.
Idempotency is the invisible glue that makes all of these patterns safe. At-least-once delivery means consumers will occasionally receive the same message twice. Every consumer must be designed to process duplicate messages without side effects — by checking a processed-event-ID table, using database upserts with deterministic keys, or by making the operation naturally idempotent (setting a status to 'confirmed' is idempotent; incrementing a counter is not).
📈 Architecture Diagram
sequenceDiagram
participant O as Order Service
participant DB as Order DB + Outbox
participant D as Debezium CDC
participant K as Kafka
participant P as Payment Service
participant I as Inventory Service
O->>DB: BEGIN TX: INSERT order + INSERT outbox_event
DB-->>O: COMMIT
D->>DB: Poll outbox (CDC)
D->>K: Publish OrderCreated event
K-->>P: OrderCreated
P->>P: Charge card
P->>K: PaymentSucceeded
K-->>I: PaymentSucceeded
I->>I: Reserve stock
I->>K: StockReserved
K-->>O: StockReserved
O->>DB: UPDATE order status = CONFIRMED
Outbox Pattern + Saga choreography: order creation flows through distributed services with guaranteed event delivery and compensating transaction support.
🌎 Real-World Examples
Netflix's Hystrix library (now Resilience4j) pioneered the Bulkhead and Circuit Breaker patterns in microservices. Each service dependency gets its own thread pool (bulkhead) — a slow recommendation service cannot exhaust threads needed for video playback. Circuit breakers open after 50% error rate in a 10-second window, returning cached or degraded responses instantly. Netflix runs 'Game Days' quarterly to verify these patterns hold under real failure scenarios.
✓ Result: Streaming availability maintained at 99.97%+ even during AWS regional outages; cascading failures eliminated across 1,000+ microservices
Uber's trip lifecycle (request → match → dispatch → ride → payment → rating) is a 7-step orchestrated Saga on their Temporal workflow engine. Each step has a compensating transaction: driver re-dispatch on match failure, automatic refund if payment succeeds but driver confirmation fails. The Outbox Pattern with Cassandra + Kafka guarantees zero trip events are lost even during datacenter failover.
✓ Result: Zero trip event loss across 25M+ daily trips; full saga state observable per trip for customer dispute resolution
Shopify's Black Friday 2023 ($9.3B sales) ran on choreography-based sagas. Bulkheads isolate each merchant's order pipeline — a viral product launch for one merchant cannot degrade others. Dead Letter Queues per merchant tier with real-time alerting ensure no order is silently dropped. Circuit breakers per payment processor handle processor outages without checkout disruption.
✓ Result: $9.3B on a single day with 99.999% checkout availability; two payment processor outages handled transparently
LinkedIn's activity feed uses pure Choreography: every event (PostPublished, ConnectionMade, ProfileUpdated) fans out via Kafka to 12+ consumers. All consumers are idempotent using composite deduplication keys — Kafka's at-least-once delivery never produces duplicate feed items. Schema Registry with Avro enforces backward-compatible schema evolution across all consumers.
✓ Result: Feed P99 latency 85ms for 900M members; zero duplicate posts; schema registry blocks breaking changes from reaching consumers
🌟 Core Principles
Services communicate via messages wherever the response is not time-critical. This eliminates runtime dependency — the consumer does not need to be running when the producer sends.
Message consumers must produce the same result if the same message is received twice. Use idempotency keys, processed-event tables, or naturally idempotent operations.
Define the message schema or API contract before building either side. Use schema registries and API design tools. This prevents incompatible changes from reaching production.
A failure in one integration path must not propagate to other services. Circuit breakers stop cascading synchronous failures. Dead letter queues contain asynchronous failure storms.
Every message published, consumed, delayed, or failed must be observable. Integration points are where most distributed system mysteries live — instrument them exhaustively.
⚙️ Implementation Steps
Map Your Integration Topology
Draw every service-to-service communication in your system: direction, protocol, synchronous or async, schema format. Color-code synchronous calls red. Review: are any critical paths all-red? Those are your availability risks.
Classify by Communication Need
For each integration: does the caller need an immediate response? If yes — synchronous REST or gRPC. If no — async messaging (Kafka, SQS). If bulk — batch file. If real-time stream — Kafka or Kinesis.
Design Idempotency into Every Consumer
Before writing consumer code, answer: 'What happens if this message is delivered twice?' If the answer is 'bad things,' implement idempotency: generate a deterministic ID from the event payload and use it as an upsert key.
Implement Dead Letter Queues Everywhere
Every queue or Kafka topic must have a DLQ. Configure alert thresholds: any message in the DLQ within 5 minutes triggers a PagerDuty alert. DLQs without alerts are debugging black holes.
Register All Schemas
Every event schema goes into a schema registry (Confluent, AWS Glue, Azure Schema Registry). Set compatibility rules: BACKWARD for consumer-initiated changes, FORWARD for producer-initiated changes. Block deployments that break schema compatibility.
✅ Governance Checkpoints
| Checkpoint | Owner | Gate Criteria | Status |
|---|---|---|---|
| Integration Topology Documented | Solution Architect | Synchronous/async classification for all integrations complete | Required |
| Schema Registry Populated | Integration Architect | All event schemas registered with compatibility rules set | Required |
| DLQ + Alerting Configured | Platform / DevOps | DLQ alert firing within 5 minutes of first failed message | Required |
| Idempotency Tests in CI | QA Lead | Duplicate-message delivery tests passing in CI pipeline | Required |
| Contract Tests for All Consumers | QA Lead | Pact or Spring Cloud Contract tests registered for all consumers | Required |
◈ Recommended Patterns
✦ Saga Pattern (Choreography)
Each service listens for events and reacts independently. No central coordinator. Order Service publishes OrderCreated → Payment Service listens and publishes PaymentProcessed → Inventory Service listens. Failure triggers compensating events. Highly decoupled but complex to trace.
✦ Saga Pattern (Orchestration)
A dedicated Saga Orchestrator (Step Functions, Temporal, Camunda) drives the workflow step by step, calling each service and handling failures with compensating calls. More visible and controllable than choreography.
✦ Outbox Pattern
Write the business record AND the event to your own database in a single ACID transaction. A CDC connector (Debezium, AWS DMS) reads the outbox table and publishes to the message bus. Guarantees at-least-once delivery without distributed transactions.
✦ Event-Carried State Transfer
Events carry the full data payload consumers need. Order events carry the full order: customer ID, name, items, address. Consumers never need to call back to the source. Eliminates synchronous lookup dependencies.
✦ Claim Check
Large payloads (PDFs, images, large JSON) are stored in object storage. The message carries a reference token. Consumers redeem the token to fetch the payload. Keeps the message bus fast and lean.
✦ Consumer-Driven Contract Testing
Consumers define the contract they need from a provider. Tools like Pact verify that the provider satisfies every registered consumer contract before deployment. Prevents integration regressions in CI/CD.
⛔ Anti-Patterns to Avoid
⛔ Synchronous Chain of Death
Service A → B → C → D all synchronous. Each call adds latency. Any failure propagates up the entire chain. The system's availability is the product of each service's availability: 99.9%^4 = 99.6%. Resolution: identify the longest synchronous chains and introduce async messaging at appropriate boundaries.
⛔ Integration via Shared Database
Multiple services reading and writing to the same database tables as the integration mechanism. Schema changes break all consumers simultaneously. Deployment of one service can corrupt another's data. This is the most dangerous integration anti-pattern and the hardest to remediate.
⛔ Phantom Events
Events that trigger notifications without meaningful payload — just an entity ID. Every consumer must then call back to the source service to get the data they need. Creates synchronous coupling inside async workflows. Use Event-Carried State Transfer instead.
🤖 AI Augmentation Extensions
LLM agents analyze all registered consumer contracts when a new event schema is proposed. They generate a full impact report: which consumers are affected, what migration steps are required, and whether backward compatibility is maintained.
An AI agent monitors DLQ message patterns, classifies failures by root cause (schema mismatch, network timeout, business rule violation, poison pill), and generates remediation playbooks for each category.
🔗 Related Sections
📚 References & Further Reading
- Enterprise Integration Patterns — Hohpe & Woolf↗ enterpriseintegrationpatterns.com
- Building Event-Driven Microservices — Adam Bellemare↗ O'Reilly
- Designing Data-Intensive Applications — Kleppmann↗ dataintensive.net
- Kafka: The Definitive Guide — Shapira et al.↗ Confluent