Elevated API Errors

Incident Report for Adapty

Postmortem

Postmortem — SDK API errors & elevated latency (2026-06-19)

Summary

On 2026-06-19 between 13:27 and 13:50 UTC, the Adapty SDK API (api.adapty.io) experienced elevated latency and HTTP 504 errors for approximately 23 minutes. A surge in request volume saturated an internal coordination store that the request path depends on, causing SDK requests to queue and time out. The affected components recovered automatically as the surge subsided.

Impact

  • Duration: Approximately 23 minutes (13:27 → 13:50 UTC).
  • Surfaces affected: SDK endpoints (profile updates, attribution, receipt/purchase validation) on api.adapty.io, including the China edge. Delivery of third-party integration events (analytics and attribution destinations) was also reduced during the window.
  • Visible signal: HTTP 504 errors reached roughly half of SDK traffic at peak, with request latency well above normal.
  • Recovery: SDK clients that automatically retry recovered most requests on retry. Some third-party integration events that do not retry may be missing for this window and could require backfill — if your integration is missing data, see the note below.

Timeline (UTC)

Time Event
~13:23 An internal coordination store the request path depends on begins approaching its connection limit under rising load.
13:27 SDK API latency crosses alerting thresholds; impact begins.
13:34 HTTP 504 error rate peaks at roughly half of SDK traffic.
13:40 Reduced delivery of third-party integration events observed.
13:42–13:50 Errors clear, latency returns to normal, and integration delivery recovers.

Root cause and contributing factors

Root cause: A burst in SDK request volume drove an internal coordination store to its connection limit. When that store could no longer accept new connections, dependent SDK requests stalled and timed out, which in turn put memory pressure on the affected service instances and amplified the impact until the burst subsided.

Contributing factors:

  1. The request path could open an effectively unbounded number of connections to the coordination store, so a request surge translated directly into hitting the store's connection ceiling.
  2. The coordination workload concentrated on a single store with no sharding to spread load.
  3. There was no early-warning alert on the underlying connection metric — the first signal was already a customer-facing error rate.

What went well

  • Automated monitoring detected the problem within seconds of the threshold being crossed.
  • The system recovered on its own once the surge passed, without prolonged manual intervention.
  • Built-in connection limits kept the store's behavior bounded rather than failing unpredictably.

What we will do

Prevent

  • Bound the number of connections the request path can open to the coordination store, so a traffic surge cannot exhaust it.
  • Spread coordination load across multiple stores rather than a single one.

Detect

  • Add early-warning alerts on the coordination store's connection usage and wait time, well before customer-facing thresholds are reached.

Mitigate

  • Increase memory headroom and add graceful load-shedding to the affected SDK service so a surge degrades gracefully instead of restarting instances.

We apologize for the disruption to your applications during this window. If your integration is missing data from this period, please reach out to your usual Adapty contact and reference the 2026-06-19 incident.

Posted Jun 19, 2026 - 15:21 UTC

Resolved

This incident has been resolved.
Posted Jun 19, 2026 - 13:55 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jun 19, 2026 - 13:46 UTC

Identified

We're experiencing an elevated level of API errors and are currently looking into the issue.
Posted Jun 19, 2026 - 13:30 UTC
This incident affected: API, Dashboard, and Events.