The problem

The customer is a public transport operator with hundreds of vehicles, each carrying its own compute box. The boxes run a Linux environment that needs the usual things: occasional configuration updates, software rollouts, troubleshooting interventions, and ad-hoc data collection.

Before BlaLabs was involved, there was no real central solution. Ad-hoc scripts, manual touches, and bespoke runbooks accumulated as the fleet grew, but none of it scaled. Two problems made the obvious approach (a central Ansible controller pushing changes via SSH) impossible:

  1. The edges sit behind NAT. A central controller cannot reach them directly. The vehicles are on private mobile networks, and inbound SSH simply does not work.
  2. The edges are constrained. Each box has limited CPU, limited bandwidth, and intermittent connectivity. Whatever ran there had to be small and resilient.

What I did

I designed and built ansible-worker : a Python service that runs on each edge box, connects outward to an MQTT broker, and waits for tasks. The customer was already running MQTT for telemetry, so the protocol cost nothing to add, and its low-overhead profile fit the edge constraints.

The important design choice was inverting Ansible’s usual direction of travel. Ansible’s default mode is push: a control node SSHes into each host and runs work from the outside. That doesn’t work when the hosts are behind NAT. ansible-worker turns it into a pull architecture instead. Each host reaches out to the broker, claims work, runs it locally, and reports status back. The control plane never has to reach the edge directly.

When a task arrives, the worker runs the requested Ansible playbook and publishes status updates back via MQTT (retained, so any late subscriber sees the latest state). The central side just publishes tasks and listens.

A runner, though, is only as useful as the content you can run with it. So alongside the new tooling I worked with the customer’s team to migrate their existing configuration scripts into Ansible playbooks. That gave the new pipeline real work to do from day one and replaced a pile of ad-hoc shell with something reviewable, testable, and idempotent.

The outcome

Maintenance and rollouts across hundreds of vehicles became a publish-and-watch operation, with the playbook content built from work the team already knew. New vehicles just run a worker and join the group. No inbound SSH connectivity to the edges is ever needed.

Notes

The runner was only half the story. The other half was turning the team’s existing scripts into Ansible, which is what made the new pipeline immediately useful instead of being a tooling project sitting next to the work.

Tools

Ansible · Python · MQTT · Linux