Telemetry bus at the edge – Part 2: Examples

In this article series, we go deeper in describing the bus; why we built it, and what are the unique features. We illustrate this with some usage examples. In this second part, we’ll look at a few implementation examples.

In the previous post in this blog series, we explained the architecture and characteristics of the Avassa built-in telemetry bus. We also gave a motivation for its origin.

Take a step back: Telemetry bus at the edge – Part 1: An overview

We will now illustrate its features using examples. All examples are utilizing the Avassa command line tool (supctl). Most of the features are available in the Control Tower Web UI as well.

We will start by introspection of topics, you can ask any site which topics exist and meta-data for that topic:

$ supctl list volga topics --site at-home
- system:all-scheduler-events
- system:application-metrics
- system:audit-trail-log
- system:container-logs:popcorn-controller.popcorn-controller-service-1.kettle-popper-manager
- system:host-metrics
- system:logs
- system:scheduler-events

You see the various topics managed by the Avassa platform, like scheduler events, which report all events around the container application being scheduled on the site, container logs containing all logs from the running application containers, and host metrics containing telemetry on the host resource usage.

You can get meta-data for a topic, in the example below we introspect a topic in Control Tower containing telemetry for all deployments:

$ supctl show volga topics system:deployment-events
name: system:deployment-events
tenant: b2
labels: {}
format: json
number-of-chunks: 10
creation-time: 2022-11-04T11:36:14.713Z
requested-replication-factor: 1
current-replication-factor: 1
persistence: disk
assigned-hosts:
  - ip-10-20-10-1
leader-host: ip-10-20-10-1
worker-hosts: []
size: 2.25 MiB
entries: 4759
oldest-entry: 2022-11-04T11:36:47.908Z
last-entry: 2023-05-30T13:43:42.810Z
seqno: 4759
chunkno: 3
dropped-chunks: 0
producers:
  - name: system:deployment-events-control-tower-001
    site: control-tower
    host: ip-10-20-10-1
consumers:
  - name: telemetry:collector
    mode: standby
    last-delivered-seqno: 4759
    last-acked-seqno: 4759
    buffered-messages: 0
    clients:
      - site: control-tower
        host: ip-10-20-10-1
        primary: true
        more-n: 2

You can for example see the replication factor for the topic within the cluster, in this case only one host. You also see which producers are publishing data on the topic and any current consumers.

You can limit the size or length (in days) of a topic. When the size or length has been exceeded the older data is overwritten.

Now lets show how you from the central Control Tower can consume a topic at a site:

$ supctl do --site factory-floor-1 volga topics system:scheduler-events consume --follow

This will start a live telemetry stream (parameter --follow) from the site factory-floor-1 on the topic system:scheduler-events

So far we have looked at fairly simple topic consumers for a single site. We will now move over to more powerful queries across sites. Topic queries are always directed towards Control Tower which in turn manages the search across sites.

You can modify the search by choosing:

  • which sites to run the query on
  • which topics to run the query towards
  • what to search for
  • how the output result should be presented

We will illustrate this by performing a distributed query for “ERROR” in the system:notifications topic. This is specified using a regex search parameter filter-re-match. In this example, we leave out any site filters which means that we will perform a search across all sites.

$ supctl do volga query-topics --topics topic-names=system:logs filter-re-match=ERROR output-payload-only=true

This could give the following output:

at-home-001 : <ERROR>   2022-09-19 09:27:54.495092Z at-home-001: Could not schedule edge.popcorn-controller.popcorn-controller-service-3 to any host
...
at-home-001 : <ERROR>   2022-09-19 09:29:12.178106Z at-home-001: Probe "live" failed for container popcorn-controller.popcorn-controller-service-1.kettle-popper-manager: HTTP GET failed with reason: timeout
...
control-tower-001 : <ERROR>   2022-10-07 09:17:05.470799Z control-tower-001: Failed to pull [egistry.gitlab.com/avassa-public/movie-theaters-demo/kettle-popper-manager:latest:](<http://egistry.gitlab.com/avassa-public/movie-theaters-demo/kettle-popper-manager:latest:>) connect to registry failed: non-existing domain

We will now show how you can select a subset of the sites, first using labels, and the second example illustrates how to limit the search to a specific deployment. This is a fairly interesting and powerful example illustrating the value of a telemetry bus that is tightly connected to the edge application lifecycle management.

$ supctl do volga query-topics --topics topic-names=system:logs filter-re-match=ERROR output-payload-only=true --match-site-labels city
$ supctl do volga query-topics --topics topic-names=system:scheduler-events output-payload-only=true --sites-from-application-deployment popcorn-deployment

The above illustrated how to subscribe to existing topics. Without going into details we will briefly illustrate how to create your own topics:

$ supctl do --site factory-floor-1 volga create-topic myown-topic string
$ supctl do --site factory-floor-1 volga topics myown-topic produce foobar
$ supctl do --site factory-floor-1 volga topics myown-topic consume
$ supctl do --site factory-floor-1 volga topics myown-topic consume
{
"time": "2022-12-06T13:12:31.771Z",
"seqno": 1,
"remain": 24,
"producer-name": "REST-api",
"payload": "foobar",
"mtime": 1670332351771,
"host": "factory-floor-1-001"
}
{
"time": "2022-12-06T13:12:46.526Z",
"seqno": 2,
"remain": 23,
"producer-name": "REST-api",
"payload": "foobar2",
"mtime": 1670332366526,
"host": "factory-floor-1-001"
}

Read more:

This summarizes the second part of this article series where we provide a few examples of what an edge-native telemetry bus implementation can look like. If you’d like to learn more, I recommend to continue reading the previous or following parts of the series:

Telemetry bus at the edge – Part 1: An overview
Telemetry bus at the edge – Part 3: Consuming and producing