v0.64.3 stable

Release v0.64.3

May 02, 2026

Hardened the platform against fleet-wide reconnect storms and fixed a Windows agent crash loop.

Improved

Per-organization rate limit on agent endpoints so a single misbehaving fleet can no longer overwhelm the API. Default budget is generous enough that no normal MSP will ever notice it; tunable per environment.
Agent now honors the server's Retry-After header on 429 and 503 responses, so when the API tells the fleet to back off the fleet actually backs off instead of running its own retry schedule.
Tighter limits on agent log shipments — smaller batch sizes and a hard cap on request body size keep one chatty agent from drowning the log ingest path.
Slower agent and watchdog restart cadence on Linux (30s and 15s) prevents a network blip from triggering a thundering herd of reconnects across the fleet.
Postgres connection pool tuned up to 30 connections so heartbeat storms no longer cascade into 504 errors.

Fixed

Windows user-helper Scheduled Task was crash-looping on multiple customer tenants with an auth rejection error. The helper now starts with the correct role and a regression test prevents future drift.

A focused resilience release. The biggest piece is a three-part defense against the kind of correlated reconnect storms that can take down the API when a network blip or bad config push affects a large fleet at once: per-organization rate limits, Retry-After awareness on the agent side, and slower service restart cadence. None of this is visible during normal operation — it just means the worst case stays bounded.

The Windows fix is more user-visible: a Scheduled Task running under the standard Users group was crash-looping with an auth rejection on tenants like nexusitsys and Revenant Global. That’s resolved, with a regression test in place so it stays resolved.