June 2025: Feature releases & highlights


Site-specific configuration for disconnected behavior, automatic certificate recovery after prolonged outages, optimized upgrade handling for sites reconnecting with pending updates, and new UI filters to easily identify and sort disconnected sites.

Enhanced support for disconnected sites

From day one, the Avassa Edge Platform has set itself apart with industry-leading support for edge sites that operate under unstable or no network connectivity. We’ve built the platform to ensure operational continuity and autonomy—even when sites are entirely disconnected from the Control Tower.

Key capabilities that enable this include:

  • Edge-local control plane execution: The Edge Enforcer independently manages placement, failover, and other orchestration decisions without relying on the central Control Tower.
  • Fully distributed artifacts: All required components—container images, secrets, and configurations—are distributed to edge sites and replicated across hosts.
  • Resilient communication model: Management communication is built on a loosely coupled pub/sub bus that caches messages, ensuring graceful recovery after outages.
  • Designed for intermittent connectivity: The Control Tower–Edge Enforcer communication is lightweight and resilient to low bandwidth or intermittent connections.
  • Eventually consistent operations: Tasks like configuration changes and deployments converge across sites over time—no manual reconciliation needed.
  • Full support for local site operations: All essential actions can be performed on-site without requiring Control Tower availability.
  • Local unseal support: Sites can restart and regain secure operation without central connectivity.

Read more in our datasheet, which covers disconnected scenarios.

What’s New This Month

This spring, we’ve taken disconnected support even further, with platform-wide improvements across backend, APIs, and frontend. Highlights include:

  • Site-specific connectivity configuration: You can now mark sites as “always connected” or “sometimes connected,” allowing for tailored behavior.
  • Improved certificate handling for disconnected sites to ensure secure and seamless trust management.
  • Optimized application upgrade flows for when sites come back online after being disconnected.
  • New UI filters and sorting based on site connectivity status, making it easier to manage large fleets.

We’ve also introduced several behind-the-scenes improvements—such as automated image housekeeping and smarter caching—that smooth out real-world disconnected operations.

Site connectivity configuration

Previous behavior in Avassa, all sites were alike:

  • A disconnected site did not trigger an alert, since edge sites might come and go.
  • An application deployment waits for a disconnected site to finish.

With the June releases, we have introduced three kinds of disconnect behavior you can configure per site; the when-disconnected field:

This can be be configured per site as illustrated below:

When disconnected settingAlert when disconnectedApplication deployment status when disconnected sites matches the target
(for rolling upgrades)
Note
treat-as-normalnoDeployment remains in the deploying stateDefault and current behavior. Disconnected sites are not considered critical incidents but deployments wait for these to come online
treat-as-expectednoDeployment continues (except for canary releases) and reaches the deployed stateSites being disconnected are not considered an incident that needs attention, and deployments continue without waiting for these sites to come online.
The skipped sites will be indicated in the deployment state.
Whenever the sites become online again, the deployment will automatically be triggered for these sites again.
treat-as-erroryesDeployment remains in the deploying state.Use this setting if your sites have very stable connections. Alerts will be generated when they are disconnected, and deployments will wait for them to come online until reaching the deployed state.

This can be configured per site as illustrated below:

Screenshot of Avassa's site configuration, highlighting the 'when disconnected' setting options: treat-as-normal, treat-as-expected, and treat-as-error.

Assume you have a site with treat-as-expected as part of a rolling deployment, you will see this state in the UI:

Avassa UI showing a deployment status with a site in 'treat-as-expected' mode, indicating it was skipped due to disconnection.

When the site robot01 comes online again, the deployment will happen, no manual action needed.

Certificate management

Each Avassa site uses site- and tenant-unique certificates for secure communication, with automatic key rotation. In normal operations all key management operations, such as key rotation, are fully automated. By default, the system is designed to tolerate up to 90 days of disconnection, which has an impact on certificates expiring. Imagine your site is offline longer than this period, the certificates may expire and the site will become unaccessible.

You can now configure this disconnected grace period to suit your operational requirements.

Screenshot of Avassa's system settings, showing the configuration option for 'disconnected grace period' for certificate management.

To help operators stay ahead of potential issues, we’ve added proactive alerts for certificates nearing expiration. These alerts appear on the system:alerts topic with the event type host-certificate-expires, and include key metadata such as hostname, site name, and expiration timestamp.

Certificate status can also be monitored using the CLI:

supctl show system site-status sites
- name: my-site
  hosts-with-critical-certificates: 0
  cluster-ca-certificate-expires: 2y349d23h59m28s
  api-ca-certificate-expires: 1y39d23h59m28s

This displays the expiration status of cluster and API certificates for all sites. You can drill down to specific hosts with:

supctl show system site-status sites my-site hosts-summary
hosts:
  - host-id: 33245851-e8aa-4e9c-9007-20c23f129b64
    hostname: h05
    cluster-certificate-expires: 134d23h59m38s
    api-certificate-expires: 134d23h59m38s

The Control Tower UI also clearly highlights sites with certificates that are close to expiry—giving operators early warning in disconnected scenarios.

Avassa UI filter showing sites with critical certificates, used for monitoring certificate expiration in disconnected scenarios.

You can also drill down to a site and view certificate status

Avassa UI showing detailed certificate status for a specific site, including expiration dates and reauthorization options.

⚠️ In normal operations, no manual actions are required for certificate rotation. These alerts are only relevant for sites that have been disconnected for extended periods, where certificate expiration becomes a risk.

If a host detects that its certificates have expired due to prolonged offline operation or downtime, it will attempt to recover automatically using a secure recovery token. Upon receiving the request, the Control Tower generates a reauthorize-requested critical alert.

Administrators can approve the reauthorization either via the UI (see the “Allow re-auth” button on the host above) or by issuing the command:

supctl do system sites <site-name> hosts <host-id> reauthorize <host-name>

This reauthorization can be performed proactively or in response to an expired certificate recovery attempt. The screen shot below shows how to perform the operation in the hosts list:

Avassa UI showing the 'Allow re-auth' button on a host, used to proactively reauthorize hosts or after a certificate recovery attempt.

For complete guidance, refer to the updated documentation on configuring certificates for extended disconnected operation. Also read the reference documentation on site certificate status.

Application upgrades after re-establishment

With this release, we’ve improved how the Avassa platform handles application upgrades at sites that have been offline. Since before, queued deployments are automatically resumed once a disconnected site comes back online. These updates are picked up by the Edge Enforcer at the site.

🆕 The Edge Enforcer now skips intermediate versions and deploys only the latest available version. This behavior dramatically speeds up recovery and minimizes resource usage after reconnection.

✅ This optimization is the new default behavior. In previous versions, all “missed” versions were deployed sequentially, even if later ones superseded them.

If you have strict version sequencing requirements, you can override this behavior using the upgrade-from field in your application spec. This allows you to define mandatory upgrade paths—for instance, enforcing that version 2.1 can only be deployed if version 2.0 is already present.

Read more:

UI Connectivity status filter and sort

It is a good practice to stay on top of sites that are disconnected for longer periods of time. In order to help in that, we have added enhanced features in the Control Tower UI to be able to filter and sort on relevant states and time-stamps:

Find sites where certs are about to expire:

Avassa UI showing the ability to filter sites by 'hosts-with-critical-certificates' to find sites needing attention.

Filter on disconnected sites and sort for longest disconnect:

Avassa UI showing the ability to filter sites by 'disconnected' status and sort by 'Last disconnect' time for efficient management.

Summary: relevant documentation for disconnected sites

Relevant, and new, documents related to disconnected scenarios:

Other features

Immutable Container File System

Starting with this release, the Edge Enforcer can run on hosts with a read-only contaier root filesystem. This aligns with best practices for immutable infrastructure and simplifies deployment on read-only operating systems, such as those used in secure, automotive and minimal edge environments.

It is still possible to enforce read-only root fs in applications by setting the container layer size to zero.

Init Containers with Execution Timeout

You can define init containers that run before your main application container starts. This is useful for setup tasks such as generating files, preparing environment-specific data, or initializing state.

🆕To prevent init containers from stalling the application startup, we’ve introduced a configurable execution-timeout. If the init container exceeds this timeout (default: 10 minutes), it will be forcefully terminated to avoid blocking the deployment pipeline.


Historical Parallel: Lessons from Mars

To put the complexity of disconnected edge operations in perspective, consider the Mars Pathfinder mission in 1997. The spacecraft operated millions of kilometers away from Earth—arguably the most extreme “disconnected edge site” imaginable.

During early operations on Mars, Pathfinder began to experience system resets. The cause? A subtle priority inversion bug in the real-time OS. A low-priority task held a shared resource (a mutex), while a high-priority task waiting on that resource was blocked. Meanwhile, a medium-priority task kept the low-priority one from finishing, causing a deadlock-like condition. Eventually, the system watchdog triggered a reset.

Since Pathfinder had no real-time connectivity to mission control, diagnosing and resolving the issue had to rely on asynchronous telemetry logs and pre-distributed diagnostic tools onboard the rover. Engineers on Earth were able to reproduce the bug in simulation and deploy a patch during the next communication window—saving the mission.

https://medium.com/delta-force/the-case-of-mysterious-system-resets-on-mars-pathfinder-b01eab813b69