Postmortem — Adapty SDK API and Dashboard errors (2026-06-18)
Summary
On 2026-06-18 between 06:20 and 08:34 UTC, the Adapty SDK API and Dashboard returned elevated 5xx errors for approximately two hours. The cause was a change to an internal coordination service (used to safely serialize concurrent operations) that could not handle production load and began rejecting connections. Requests that depend on this service failed until the change was reverted, after which traffic recovered.
Impact
- Duration: Approximately 2 hours (06:20 → 08:34 UTC); the most acute window was ~07:10 → 08:17 UTC.
- Surfaces affected: SDK endpoints (purchase and receipt validation for the App Store, Play Store, and Paddle), the Dashboard (publishing Flows), and some third-party integration deliveries.
- Visible signal: Elevated 5xx error rates and increased latency on SDK endpoints; Dashboard Flow publishing returned errors.
- Recovery: SDK clients that automatically retry recovered most requests. Some integration deliveries or operations that do not retry may need to be re-run; if you believe data from this window is missing, contact your Adapty representative.
Timeline (UTC)
| Time |
Event |
| ~06:20 |
Automated monitoring begins reporting elevated SDK latency and 5xx errors. |
| ~07:11 |
Impact on Dashboard Flow publishing confirmed; investigation underway. |
| ~07:21 |
Root cause identified; remediation begins. |
| ~08:00 |
Change reverted; error rates begin falling. |
| ~08:17 |
Traffic stabilized. |
| ~08:34 |
Recovery confirmed; incident closed. |
Root cause and contributing factors
Root cause: An internal coordination service that serializes concurrent operations was switched to a new backing store that was not provisioned for production load. Under real traffic it exhausted its connection and memory limits and began resetting connections, so any request needing to coordinate an operation failed.
Contributing factors:
- The change reached full production traffic without a gradual/canary rollout or an instant rollback control to bound its blast radius.
- The new backing store was under-provisioned for production load.
- A failure in this coordination service caused requests to fail outright rather than falling back gracefully, so a single dependency issue affected many surfaces at once.
What went well
- Once engaged, the team identified the root cause within minutes.
- The fix was straightforward and fast to apply (revert the change).
- The coordination service failed safely — there is no evidence of inconsistent or duplicated data.
- SDK clients with automatic retry recovered most requests.
What we will do
Prevent
- Roll out changes to shared coordination infrastructure gradually (canary), with an instant rollback control.
- Require a kill-switch for any change exercised against shared production infrastructure.
Detect
- Add targeted early-warning alerts on the coordination service's saturation (connections/memory) so issues are caught before they affect customer requests.
Mitigate
- Make the coordination service fail gracefully (fall back / degrade) instead of failing requests outright.
- Right-size the new backing store before any further production exposure.
We apologize for the disruption to your applications during this window. If your integration is missing data from this period, please reach out to your usual Adapty contact and reference the 2026-06-18 incident.