Back to overview

Client Registration Disruption

May 14, 2026 at 10:12pm UTC
Affected services
Dashboard & API

Resolved
May 14, 2026 at 10:12pm UTC

Postmortem: Client Registration Disruption

Date: May 14, 2026

Incident Duration: ~3 hours

Impact: Intermittent failure of client registration and connection establishment across US-East regional Points of Presence (PoPs).


Executive Summary

On May 14, 2026, the US-East region experienced a degradation in connection processing. A surge in holepunch requests, stemming from suboptimal code execution paths, led to resource exhaustion within the supporting infrastructure. This created a bottleneck in source address verification, preventing clients from successfully completing the registration handshake. Service was restored following targeted code optimizations and an expedited infrastructure scaling event.

Incident Timeline

  • T-00:00: Observed sharp increase in holepunch request volume at US-East regional PoPs.
  • T+00:15: Cache and database infrastructure report elevated latency and resource utilization.
  • T+00:30: Inability to verify client source addresses leads to widespread registration failures.
  • T+01:45: Engineering identifies specific code inefficiencies contributing to the request volume.
  • T+02:30: Resolution: Optimized logic is deployed; supplementary infrastructure capacity is provisioned.
  • T+03:00: Connectivity metrics return to baseline; all regional systems confirmed operational.

Root Cause Analysis

The disruption was the result of a resource exhaustion cascade triggered by internal logic inefficiencies:

  1. Request Volume Anomalies: Suboptimal code paths generated a disproportionate volume of holepunch requests relative to standard site traffic.
  2. Downstream Pressure: The high frequency of these requests overwhelmed the regional database and caching layers, leading to delayed response times.
  3. Verification Bottleneck: The system requires successful holepunch completion to validate client source addresses for security and functionality. Due to infrastructure timeouts, this validation could not be completed.
  4. Registration Inhibition: Failure to verify source addresses prevented the system from authorizing new client connections.

Impact Assessment

  • Client Connectivity: Users in the US-East region were unable to register or establish stable connections.
  • Infrastructure: Database and cache utilization reached critical thresholds
  • Verification Logic: The source address validation mechanism was temporarily unable to process inbound requests, leading to a "fail-closed" state for new connections.

Corrective Actions & Preventative Measures

Completed

  • Logic Optimization: Refined the code responsible for holepunch request handeling to reduce unnecessary overhead.
  • Capacity Expansion: Scaled the regional database and cache clusters to better accommodate peak request volumes.