Getting started with Edge AI: Step-by-step

I have noticed a significant increase in pressure among our users to implement AI at the edge over the past couple of months. Applied AI is becoming mainstream, and some use cases like object detection, classification, and tracking are becoming commodities at this point. With a robust set of tools and models available as open source, it is very cheap and easy to start exploring.

Introduction

We have been using a people counter demo application a long time now as a great example of an application that needs local hardware resources (a camera) and that can survive upstream outages. So I decided to build a real solution with hardware and software to count people in our offices.

Most of the examples and tutorials that I found along the way are focused on single instances of the application, mostly on some sort of local server or Raspberry the user has shell access to and even a screen and a keyboard attached.

In the real world, i.e., at the scalable edge, that kind of setup is not a very realistic proposal. The application needs to be packaged for deployment to hundreds or more locations, which does not provide host-local access and uses a centralized configuration place (in my case our Avassa Control Tower) for application lifecycle as well as Day 2 operations.

I learned quite a bit along the way, and here is a write-up based on my experience, which I believe is general enough to apply to a broader set of inference-based applications beyond my own.

Heads-up, I will not go through the code in detail, but focus mostly on preparing, packaging and deploying the application in a container, even though a couple of code-level detours are unavoidable.

Picking the use case for Edge AI and mapping the components

My use case is fairly straightforward. I want to point a camera towards a doorway, and then count the number of people arriving and departing through it. Technically speaking, I want an application that consumes a video stream to detect, classify, and track “person” objects crossing a virtual line, incrementing directional counters (right-to-left, left-to-right) accordingly.

By breaking down the use case into components, I found that all the constituent parts were readily available and that I was definitely not the first person to build something like this. There are several tutorials for each part, so my initial sense was that the process would be very straightforward.

For hardware, I ordered a Raspberry Pi 5 with the AI HAT+ and camera module.

I started by putting together the application and running it directly on the host OS and ended up with the following software bill of materials:

A kernel with drivers for both the AI HAT+ and the camera. I went along with Raspberry Pi OS after checking that the version I picked had appropriate and recent drivers.
A small(ish) python application that uses the hailo_platform Python library and drivers for model loading and inference on the AI HAT+, picamera2 for video capture, headless OpenCV (cv2) for some image manipulation, numpy for array conversions, and the Flask framework for a simple web application.

With these components selected, I was ready to start putting the application together. If you want to skip ahead, I’ve shared the public repo here.

Packaging the application for edge deployment

The real challenges came when I started working through how to package my application in a container for deployment, and started diving into how to make the hardware resources available to the application from inside a running container.

The hunt for device nodes

An Edge AI application like mine needs access to two things:

The sensors, in my case the video camera
The accelerator, in my case the neural network accelerator in the AI HAT+

Applications normally interact with hardware resources through device nodes (e.g. /dev/video0 for my camera). The first task is to determine exactly which device nodes my application needs access to. I struggled a bit with this, since most tutorials and documentation assume running the application on the host OS, where there is no need to specify exactly which device nodes are available. I could not find any documentation for the camera and accelerator that explicitly lists the necessary nodes.

To make it more interesting, this trial-and-error process had to be done from inside the container deployed on the target Raspberry Pi. This means I had to build images in my local environment, deploy to the target, and inspect trace logs from the application interacting with the driver layer.

The Avassa Edge Platform uses udev patterns to match the subsystems we need to make their corresponding device nodes available to the application. I ended up with two sets of patterns in the site configuration, one for the accelerator (hailo) and one for the camera (media):

device-labels:
  - label: hailo
    udev-patterns:
      - SUBSYSTEM=="hailo_chardev"
  - label: media
    udev-patterns:
      - SUBSYSTEM=="media"
      - SUBSYSTEM=="video4linux"
      - SUBSYSTEM=="dma_heap"

This feature allows the application specification to reuse the device-labels and request access to the corresponding resources with the following construct.

containers:
  - name: detector
    devices:
      device-labels:
        - media
        - hailo

I also came across a corner of udev I had never heard of before, the /run/udev directory. It contains a binary device database and temporary files that reflect the information the kernel reports about the hardware currently present. User-space programs like my camera-using application (that use libudev) rely on this directory indirectly to discover and monitor devices. To enable access to that whole directory, I had to configure a system volume and mount it from the application:

system-volumes:
  - name: runudev
    path: /run/udev

In hindsight, I could probably have accelerated this process by e.g., by reading the picamera2 source code and by looking at some kernel traces for the hailo_platform. But I learned a lot!

💡 Make sure you know which hardware and kernel resources your application requires, and plan how to make them available to the container.

Containerizing an Edge AI application: Drivers and Libraries

I now had a simple application that knew how to reach the right hardware resources; the next step was to start building out the Containerfile for my real-world deployment scenario.

The main challenge for me in this step was related to the fact that most components of Raspberry Pi OS are hardware-dependent (see above) and focus on a host-local GUI. This makes them less of a natural fit for containerization. This means that I had to find a way to install the applicable parts (drivers and libraries) on a different base OS image.

I wanted a lean image, so I went with python:3.11-slim-bookworm, which is at least the same base OS (Debian Bookworm) as the Raspberry PI OS I had done the first steps on. I then had to find a way to install drivers and Python wrappers for the camera and the accelerator.

The camera part was fairly straightforward, with the only additional step being to add the Raspberry Pi bookworm repository to the sources list in the build environment. I could then apt-get the appropriate packages.

The accelerator part required a couple more steps. The driver and Python code are not publicly available, but require downloading an arm64 driver package and a Python wheel. I then had to copy the files into the container and install them from the local file system. It took some time to determine the optimal combination of driver version, Python language version, and wheel version.

💡 Container images are built on base containers, and picking the right one to support your use case (including driver, language, and library versions) can be tricky since most people seem to run stuff… uncontainerized.

Choosing and packaging the right AI model for the Edge Use Case

I needed to pick a model with support for object detection, classification, and tracking to support my use case. The currently most obvious choice is the YOLO family of real-time object detection models known for their speed and relative accuracy. They are also freely available and have a large community around them. The most performant at the time of my build was YOLOv11, which also happened to be readily available in HEF format, which is the model file format consumed by the Hailo platform library.

I wanted to keep the model file in its own container image layer, to be able to utilize layer caching by the container runtime and make the container self-contained and easier to reproduce and roll back. So when I update the model file later, perhaps after tuning it, I make the delta minimal while keeping the container-level versioning.

For extra points, I tagged the image with both application and model versions (i.e. visitor-counter:1.2.0-model-YOLOv11. This makes more sense if you plan on rapid iterations on the model and need to capture rapidly changing model version names for lifecycle purposes.

💡 Think through how to package your model files based on best container practices and what your model update cadence will look like.

Making the Edge AI application monitorable and debuggable

I added a health check endpoint /healthz to the application that probes the status of the camera driver and the virtual inference driver and returns 200 if all is well, and 500 with an error message if there is an issue. This is complementary to the built-in startup and readiness probes that uses the container runtime for status checks on individual containers.

I also made sure to add liberal amounts of logging along three log levels and made the level configurable using environment variables. The Avassa Edge Platform captures the container runtime logs on a container level and locally stores a configured amount of logs for later replay and filtering. I made the application specification pick the log-level from a site-local label called people-counter-log-level. This means that I can adjust the log level on a site-by site basis when needed.

        env:
          LOG_LEVEL: ${SYS_SITE_LABELS[people-counter-log-level]}

As a stretch goal, consider instrumenting your application to provide model-specific performance telemetry like inference latency, specific resource utilization (GPU or other accelerators). This will provide a sound foundation for future fine-tuning and model efficiency work.

💡 Consider the multi-layered operational aspects of your application and instrument accordingly to satisfy the needs of ops teams as well as application developers and data teams.

Deploying the Edge AI application at scale and counting the people

I was now ready to put the final touches on the application specification and then set sail.

I needed two more things specific to this type of application that I haven’t mentioned before to make it work. First, I need the application specification explicitly request access to the device nodes matched by my device-labels above:

devices:
  device-labels:
    - media
    - hailo

I also ran into a known problem with Docker around that files that belong to an image will have the wrong ownership inside the container that runs in a host user namespace mode when the daemon is configured to run with user namespace remapping by default. In order to get around this I had to turn hon

user-namespace:
  host: true

I ended up with the following complete application specification:

name: people-counter
version: 0.0.1
services:
  - name: inference
    mode: replicated
    replicas: 1
    volumes:
      - name: runudev
        system-volume:
          reference: runudev
    share-pid-namespace: false
    containers:
      - name: detector
        mounts:
          - volume-name: runudev
            mount-path: /run/udev
        devices:
          device-labels:
            - media
            - hailo
        container-log-size: 100 MB
        container-log-archive: false
        shutdown-timeout: 10s
        image: people-counter
        user-namespace:
          host: true
        on-mounted-file-change:
          restart: true
        env:
          LOG_LEVEL: ${SYS_SITE_LABELS[people-counter-log-level]}
        probes:
          readiness:
            http:
              scheme: http
              port: 8080
              path: /healthz
            initial-delay: 10s
            timeout: 1s
            period: 10s
            success-threshold: 1
            failure-threshold: 1
    network:
      ingress-ip-per-instance:
        protocols:
          - name: tcp
            port-ranges: "8080"
        access:
          allow-all: true
on-mutable-variable-change: restart-service-instance

I was now ready to deploy, and I did that with the following simple deployment specification:

name: people-counter-deployment
application: people-counter
application-version: 0.0.1
placement:
  match-site-labels: system/name = avassa-stockholm-office

And after a couple of seconds I had my first instance of the visitor-counter up and running!

I also wanted to make sure that the application worked, even when I was not on the on-premises subnet, so I took advantage of a feature in our command line tool (supctl) that allows for remote tunneling from the local host to the application network of a specific service instance. Very handy when the on-premise subnet is not reachable (NATs, firewalls, etc) from the outside.

The following command allowed me to point my browser to 127.0.0.1:8080 and be presented with the application:

supctl do -s avassa-stockholm-office applications hailo-detection service-instances inference-1 connect tcp 8080 --bind 8080

And now I could see the counter counting real people at our offices:

Real-time Edge AI people counter interface showing object detection bounding boxes and directional movement tracking in an office environment.

Conclusions

Putting this to project together took a little more time than I thought, and the majority of the extra time spent was directly related to preparing the application for real world-like deployments. I think it’s a nice reflection on where the edge AI industry is at the moment. We have the components, we are building the experience, but some of the operations parts are less understood. And while this was more of a in-the-office project in its goals and some of the component choices, I found it very illuminating and I look forward to working with many more customers and users on projects like mine. And hopefully, this write-up can help in getting started with Edge AI deployments, not only in theory but in practice, and at scale.

Video guide: Getting started with Edge AI

In this webinar recording, I cover most of the steps outlined in this article so that you can watch it come alive, from containerizing the application to counting people passing by.

Highlighted resources

What is Edge AI? Key Benefits & Why You Should Use It

Smooth Sailing at the Edge: How to Migrate Legacy VMs to Containers with Avassa

Edge Observability – Shifting Left for Proactive Monitoring