Architecture· 8 min read

Microservices Patterns in Fintech: Lessons from 1M+ Monthly Transactions

When we set out to build the QR payment system for Mercado Pago in Buenos Aires, we had nine engineers, a two-month deadline, and the expectation that the system would process hundreds of thousands of transactions monthly from day one. Here is what we learned.

The Problem with Simple REST in Payments

The temptation in any new service is to reach for synchronous REST calls. They are easy to reason about, easy to test, and easy to debug. However, in a payment system, synchronous coupling between services is a reliability tax that compounds quickly.

If Service A calls Service B to validate a QR code, and Service B calls Service C to check merchant status, you have a cascading dependency chain. A 200ms latency spike in C becomes 200ms + overhead in B, which becomes 400ms+ in A. Under load, that is how you breach your SLA.

💡 Key Insight

Every synchronous dependency in a payment flow is a potential SLA breach. Design for failure first.

Pattern 1: Event-Driven Choreography

For our QR payment flow, we decomposed the transaction lifecycle into events: QRScanned, MerchantValidated, AmountAuthorized, TransactionSettled. Each service subscribes to the events it cares about and publishes its own events downstream.

The result: the QR scanning service does not need to know about the settlement service. They are loosely coupled through a message broker (we used Kafka). When the settlement service went down for a deployment, QR scanning continued working — unaffected.

// Simplified event structure
{
  "event_type": "QR_SCANNED",
  "transaction_id": "txn_abc123",
  "qr_hash": "sha256:...",
  "merchant_id": "merch_456",
  "amount_cents": 2500,
  "timestamp": "2024-03-15T14:32:11Z",
  "idempotency_key": "client_789_1710510731"
}

Pattern 2: Saga Pattern for Distributed Transactions

In a monolith, you have database transactions. In microservices, you do not. When a payment involves debiting a user wallet, crediting a merchant account, and updating a ledger — all in separate services — you need a way to roll back partial changes if something fails.

We implemented the choreography-based saga: each service publishes a success or failure event, and compensating transactions are triggered automatically. If merchant credit fails, a WalletDebitReversed event is published and the user's balance is restored.

Pattern 3: Idempotency Keys — Non-Negotiable

In mobile payments, network timeouts are a daily reality. A user's phone submits a payment, times out, and the app retries. Without idempotency keys, you charge the user twice.

Every payment request carries a client-generated idempotency_key. The server stores the result keyed by this value. If the same key arrives twice, the second call returns the cached response immediately without re-executing. This pattern eliminated duplicate charge incidents entirely.

Operating at 1M+ Transactions/Month

Getting to scale was an iterative process. The first week we launched in Buenos Aires we processed around 10K transactions. A month later, 300K. By month six, we crossed 1M and held there for 12+ consecutive months with 100% SLA compliance.

  • Observability first — Datadog dashboards for each event type, with alerts for event lag, consumer group offsets, and DLQ growth
  • Circuit breakers — graceful degradation when downstream validation services degraded
  • Blue-green deployments — zero-downtime deploys kept SLA clean
  • Chaos engineering — we intentionally killed services in staging every sprint

What I Would Do Differently

With hindsight: I would invest earlier in contract testing between services using tools like Pact. We relied heavily on integration tests in staging, which were slow and occasionally flaky.

I would also establish a single, canonical event schema registry from day one. We evolved our event schemas organically and paid a migration tax later.


José Alejandro Berrío MarínLead Software Engineer · 9+ years in fintech across LATAM · ex-Mercado Libre, Rappi, Leal