Debunking the great myths around Kubernetes at the edge
Being the Product Lead for an Edge Platform includes the pleasure of reading articles and blogs to navigate the edge computing landscape. Recently, I sat down and read a blog post from a team member at Spectro Cloud, Navigating the Digital Frontier: Edge Computing and Kubernetes
I have to admit that some statements in that article really surprised me. Setting aside the fact that Avassa and Spectro Cloud offer competing solutions for edge orchestration and management, the article make claims around using Kubernetes at the edge that in my personal opinion are in conflict with both the very design principles of Kubernetes as an orchestrator, and the edge as a highly distributed environment, and requirements we here from real edge deployments, alike.
This had me jot down a few of the biggest and most common myths around using Kubernetes as the no 1 orchestrator at the edge.
Myth 1: Kubernetes is designed to solve the challenges of edge orchestration
Myth 2: Load balancing and auto-scaling are the primary requirements for the edge
Myth 3: Placement decisions between edge sites shall take dynamic measures into account
I will elaborate on each of them below.
Myth 1: Kubernetes is designed to solve the challenges of edge orchestration
The article claims: Enter Kubernetes: The Master Orchestrator.
What this statement fails to take into consideration is that each edge site needs to act as an autonomous cluster, with an inner edge control loop and an outer central control loop. The outer central control loop needs to drive desired changes, for examples whenever a new application needs to be deployed across certain edge sites. If doing things the Kubernetes way, this can be achieved with smaller distributions like K3s at the edges and something totally different, e.g. Rancher, acting as the master orchestrator across the edges. Early attempts to stretch central Kubernetes clusters out to the edge tends to fall apart as soon as they hit slow networks and/or larger numbers of edge sites. Most of the K8s multi-cluster managers are good for managing clusters in a number of data centers, but are far less succesful when it comes to managing 10 000 edge sites each with a cluster in each. Even a solution with a name that associates to Kubernetes, KubeEdge, does not run Kubernetes at the edge. So, Kubernetes is not the Master Orchestrator across edge sites. It can in certain cases with a small number of edge sites act as an edge local cluster. At Avassa, we took another strand at this, designing from the container runtime. Our light-weight container orchestrator, the Edge Enforcer, is purpose-built for the edge and operates togheter with a central cloud based “master orchestrator”, the Control Tower. This is of high importance when:
- You need strong offline capabilities at the edge.
- You need more than “just” Kubernetes at the edge. The edge operations will also require log management, multi-tenancy, metrics collection, DNS, image registry, user management, edge local security, distributed secrets management, edge local networking and more. This will add up to a complex software stack if building it around Kubernetes.
- You need an application developer and operations friendly self-service portal to run and monitor your edge container applications. A bare-bone multi cluster manager seldom solves the needs of the applications and operations team.
Myth 2: Load balancing and auto-scaling are the primary requirements for the edge
Let us move to the next myth around auto-scaling and load balancing on the edge sites. Auto-scaling is a primary requirement for any central/public cloud. You could even say that it’s the whole idea with running applications from a central location, to have unlimited resources. That way, the application can scale according to need. But at the edge, you’re dealing with infrastructure of an opposite nature. There are limited resources, and you might have a cluster of no more three rugged PCs in each location. Not much more CPU to scale out to. That said, the load is sort of already scaled since you are running at each edge site, responding to local clients. I’m not saying that load balancing at each separate edge is wrong, but we see few real use cases around it. Rather the following is a list of much more edge focused requirements:
- Self-healing sites, withstanding failing network connectivity. Being able to migrate applications across hosts on the edge site without connectivity to the central cloud.
- Edge-local placement. For example, you have packaged your Edge AI model as as container and need a GPU to run it. The edge local control loop should be able to place the container only on a host with GPU.
- Edge-local networking. Fully automatic configuration of the container application networking on each site is needed. You can not afford local IT staff or complex service meshes designed for the central use case.
- Local security. Edge hosts are not perimeter protected. The local LAN might not be secured, creating very specific requirements that needs to be built-in to the edge local orchestrator.
- Deep multi-tenancy. In many cases the edge infrastructure needs to be shared and isolated between tenants.
All of the above are edge-native requirements we see from customers deploying to and operating at the edge. And needless to say, if you need local load balancing you can drop for example a NGINX container to the edge site. Although, I am a believer of balancing the importance of requirements in each specific context.
Myth 3: Placement decisions between edge sites shall take dynamic measures into account
Finally, the third myth is around edge placement. When we talk to customers, they often have an application that they need to deploy to very specific sites. In selected stores, in rugged PCs performing Edge AI functions close to a camera, in trucks to collect data from the CAN-bus etc. etc. Therefore edge site placement should based on site labels, like “Customer = ACME”, “Truck = Model X”, and “Factory Floor size = medium”. Then within each site the local scheduler dynamically places the application on hosts based on availability of devices such as cameras, GPUs, and other attributes.
In reality we see very little demand for placing this application on an edge site with for example Latency < X or system load < Y or other more infrastructure-centric measurements. That requirement is more valid to manage regional data center solutions and CDNs. In the edge use case, you deploy an application in a robot, or in a store, and you simply want it to sit there. You don’t move your point of sales system or video analytics from one store or factory floor to another for no reason.
What I have tried to illustrate in this blog is the following basic principles:
- Understand the problem, and always start there
- Select a tool that solves that problem
Just because Kubernetes solves the application orchestration challenge for data centers, do not assume it solves your problem of managing a large set of resource constrained edge sites and the applications that run there.
But hey, that’s just the humble opinion of a Product Lead within the edge space.
This is in article written by Avassa Product Lead Stefan Wallin. The opinions expressed in this article are his own.