The reason behind this incident was a wrong NAT configuration. It limited the number of outgoing connections from worker pods to PgBouncer. This configuration was applied several weeks before the incident and we haven't reached connection limits therefore the problem was not visible to us. But this time we received more requests than usual and that caused the problem. Requests started to pile up and we couldn't quickly identify the issue because all internal metrics were fine. Once we realized the problem is caused by the network, we checked the correlated configurations and found the core of the issue. After increasing the limits, everything went back to normal within a couple of minutes.