Elevated API Errors

Incident Report for Adapty

Postmortem

Summary

On December 5th, Adapty experienced an API outage caused by an internal configuration change during deployment. The update made our application unable to connect to the database, and an attempted rollback became stuck due to a deployment deadlock. After manually clearing the stuck deployments and restoring the previous version, all services returned to normal. No data was lost.

What happened

  • We deployed a change that updated the database user used by our API.
  • When the servers restarted, they could not connect to the database because the connection pooler became overloaded.
  • Attempts to cancel the deployment and roll back caused two deployments to hang simultaneously, creating a deadlock.
  • Our infrastructure engineer manually cleared the deadlock and redeployed the previous version.
  • The system recovered fully.

Root cause

  1. API servers failed after restarting.
  2. They were restarting due to a database user configuration change.
  3. Connection pooler stopped accepting new connections due to exceeding its file descriptor limits.
  4. Connections from both the old and new users were combined, quickly exhausting resources.
  5. We had not accounted for this behavior during short-term connection spikes.

What we’ll do to prevent this in the future

  • Add alerts for high TCP connection counts to connection pooler.
  • Adjust system limits (“max open files”) based on real connection spikes during restarts.
  • Update the release guide with a clear procedure for identifying and resolving deployment deadlocks.
  • Add an alternative connection pooler to allow quick failover.

We’re sorry for the disruption this incident caused. Stability is very important to us, and we’re taking several steps to ensure this doesn’t happen again.

It’s also important to note that fallback paywalls remained accessible throughout the outage, and we encourage all customers to update to the latest SDK version to benefit from local access levels, which provide additional resilience during rare service interruptions.

Posted Dec 05, 2025 - 18:52 UTC

Resolved

This incident has been resolved.
Posted Dec 05, 2025 - 12:29 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Dec 05, 2025 - 12:26 UTC

Update

We are continuing to investigate this issue.
Posted Dec 05, 2025 - 12:13 UTC

Investigating

We're experiencing an elevated level of API errors and are currently looking into the issue.
Posted Dec 05, 2025 - 12:08 UTC
This incident affected: API and Dashboard.