Show HN: Multi-agent orchestration layer for experimentation

Share

🚀 A Minimal Docker “Chassis” for Managing Long‑Running Agent Fleets

TL;DR – chassis is a lightweight, file‑system‑centric Docker orchestration layer that lets you run, scale, and manage a fleet of autonomous agents (e.g., bots, collectors, micro‑services) without the heavyweight overhead of Kubernetes or Swarm. It relies on a two‑tier, agent‑native file system inside a container, a simple CLI, and a pluggable plugin architecture that makes adding new agent types as easy as dropping in a module.*

1. Why Another Orchestration Tool?

The container ecosystem exploded around 2014‑2015, and Kubernetes quickly became the de‑facto standard for orchestrating micro‑services. While Kubernetes excels at running stateless, horizontally scalable services, it has a few pain points for long‑running, stateful, or “agent‑centric” workloads:

| Pain Point | Description | Typical Mitigation | |------------|-------------|--------------------| | Complexity | K8s requires a cluster, control plane, etcd, API server, etc. | Keep it simple – a single node or single container orchestrator | | Resource Footprint | K8s components consume 50–100 MiB of RAM at minimum. | Use a lean runtime that only needs the Docker daemon | | State Management | Agents often need persistent local state or local storage. | Provide a filesystem that behaves like a real disk | | Observability | Agents may produce local logs, metrics, or require a local DB. | Expose local data via a unified file system and CLI | | Learning Curve | DevOps teams must learn K8s concepts. | A minimal CLI with simple commands and clear documentation |

chassis tackles these issues head‑on by offering a “bare‑bones” orchestration layer that lives inside Docker. The design is intentionally minimal: the goal is not to replace K8s but to provide a drop‑in solution for a specific class of workloads – long‑running agent fleets.


2. High‑Level Architecture

2.1 Core Components

+---------------------------+
|      chassis CLI          |
|   (control & orchestrator)|
+------------+--------------+
             |
             |  (API/REST)
             v
+---------------------------+
|  Docker Engine (daemon)   |
|  +-----------------------+|
|  |  chassis‑agent image   |
|  |  (runs all agents)     |
|  +-----------------------+|
+---------------------------+
             |
             |  (bind mounts)
             v
+---------------------------+
|   Agent‑Native FS (two tiers)
|  (a.k.a. “chassis‑fs”)     |
+---------------------------+
  • CLI – the user interface; exposes commands like chassis start, chassis stop, chassis logs, chassis plugin install, etc.
  • Docker Engine – the underlying runtime that actually runs the containers. chassis doesn’t need anything beyond a standard Docker daemon.
  • Agent‑Native FS – a lightweight, two‑tier file system that provides each agent with a virtual disk. The first tier is a host‑side volume for persistent data; the second tier is an in‑memory overlay that holds temporary state.
  • Plugins – optional modules that can extend chassis functionality (e.g., custom networking, storage backends, monitoring hooks).

2.2 The Two‑Tier File System

A core innovation of chassis is its dual‑layered file system:

| Layer | Characteristics | Use‑Case | |-------|-----------------|----------| | Tier 1 – Persistent Store | Backed by a host directory (e.g., /var/chassis/agent-{id}) or a networked block device (NFS, EFS, etc.) | Stores agent logs, database files, long‑term state | | Tier 2 – In‑Memory Overlay | A RAM‑disk (e.g., tmpfs) mounted over Tier 1 | Provides fast read/write for volatile data (caches, temp files) |

The overlay is implemented using Docker’s tmpfs mounts, which keeps data in RAM and automatically clears it when the container stops. Because both tiers are mounted inside the same container, agents can treat the file system like any other disk: they don’t need to be aware of the underlying layering.


3. Installation & Bootstrapping

3.1 Prerequisites

| Requirement | Minimum Version | Notes | |-------------|-----------------|-------| | Docker Engine | 19.03+ | chassis relies on the tmpfs feature introduced in Docker 18.06 | | Host OS | Linux / macOS | The agent filesystem uses Linux tmpfs semantics; on macOS it falls back to a regular volume | | CPU | 1 core | Chassis is designed for lightweight workloads | | RAM | 256 MiB | At least enough for the tmpfs overlay (default 64 MiB per agent) |

Tip: If you’re running on a VM or cloud instance, a small 1‑CPU, 2 GiB RAM VM is more than sufficient for a modest agent fleet.

3.2 Pulling the Image

# Official image on Docker Hub
docker pull chassis/chassis:latest

3.3 Running the CLI

Chassis provides a binary that can be used directly or packaged as a Docker container:

# As a local binary
docker run --rm -it \
  -v /var/chassis:/var/chassis \
  -v $(pwd)/plugins:/opt/chassis/plugins \
  chassis/chassis \
  chassis --help

# As a Docker container with a custom entrypoint
docker run --rm -it \
  -v /var/chassis:/var/chassis \
  chassis/chassis \
  sh
Note: The -v /var/chassis:/var/chassis volume is where all agent state lives. If you wish to use a different directory, simply change the mount point.

3.4 Bootstrapping a Fleet

# Create a new fleet named "collector"
chassis create fleet collector

# Add an agent definition (YAML)
cat > collector-agent.yaml <<'EOF'
name: data-collector
image: myrepo/collector:1.0
resources:
  cpu: 0.5
  memory: 128Mi
volumes:
  - name: data
    mountPath: /data
env:
  - name: API_KEY
    valueFrom: secret://collector/api-key
EOF

# Deploy the agent
chassis deploy agent -f collector-agent.yaml

The create fleet command sets up the necessary directory structure, while the deploy agent command pulls the image, creates the container, mounts the two‑tier FS, and starts the agent.


4. Core Features

| Feature | What It Means | How It Works | |---------|---------------|--------------| | Fast Startup | Agents start in < 2 s | The container image is minimal (busybox + chassis), and the Docker daemon is cached. | | Per‑Agent Storage Isolation | Each agent has its own filesystem | The tmpfs overlay is created per container, and persistent data is isolated in its own host directory. | | Zero‑Down‑time Restarts | Agents can be restarted without losing state | Persistent data is preserved; the in‑memory layer is wiped, but agents handle reconnection/replay logic. | | Simple Scaling | Add or remove agents via CLI | chassis scale fleet collector 5 spins up five instances. | | Health Checks | Agents expose /healthz endpoints | chassis monitors these endpoints and restarts unhealthy containers. | | Plugin System | Extend functionality | Plugins can hook into lifecycle events (pre‑start, post‑stop, etc.) and inject custom logic. | | Observability | Centralized logs, metrics | Chassis aggregates container logs and can push them to external services (ELK, Prometheus, etc.) via plugins. | | Security | Minimal attack surface | Agents run in a dedicated namespace, with restricted capabilities; the host filesystem is isolated. | | Port Forwarding | Debug agents locally | chassis port forward agent-1 8080:80 forwards the agent’s HTTP port to the host. |


5. Deep Dive: The Agent Lifecycle

  1. Image Pull
    When you run chassis deploy, Docker pulls the agent image. Because chassis uses a base image (chassis/base) that only contains busybox, the actual agent image can be lightweight.
  2. Container Creation
    chassis generates a Docker command that mounts:
  • A tmpfs overlay at /tmpfs inside the container.
  • A persistent host directory at /var/chassis/<agent-id>.
  • Optional volumes for secrets, config, or external data stores.
  1. Filesystem Mounts
    The agent sees a single, unified filesystem. It can read/write as normal. The agent has no idea that /tmpfs is actually RAM.
  2. Startup Script
    The container entrypoint is a tiny script that:
  • Verifies the health of the host FS.
  • Loads environment variables from secrets or a secrets manager.
  • Executes the agent binary (e.g., /usr/local/bin/collector).
  1. Run & Monitor
    Once started, the agent runs normally. chassis monitors the container’s health and logs. If the agent dies, chassis automatically restarts it (unless you set restart_policy: never).
  2. Shutdown
    When you chassis delete agent <id> or scale down, chassis stops the container, unmounts the volumes, and deletes the host directory if you request cleanup.

6. Plugin Architecture

6.1 What Plugins Can Do

  • Custom Networking – create a Docker network for a fleet and assign static IPs.
  • Storage Backends – replace the local host directory with an NFS or cloud block store.
  • Monitoring Hooks – push metrics to Prometheus or dashboards to Grafana.
  • Security Enhancements – inject AppArmor or SELinux policies per agent.
  • Event Handlers – run scripts before or after agent startup.

6.2 Installing a Plugin

# Install from a local directory
chassis plugin install ./my-plugin

# Install from a registry
chassis plugin install registry://myrepo/chassis-plugins/monitor

Each plugin is a simple Go or Python binary that implements the chassis-plugin interface. The interface exposes lifecycle hooks:

type Plugin interface {
    // Called before agent container is created
    PreStart(ctx context.Context, agent *Agent) error

    // Called after agent container starts
    PostStart(ctx context.Context, agent *Agent) error

    // Called before agent container is destroyed
    PreStop(ctx context.Context, agent *Agent) error
}

6.3 Example: A Simple Logging Plugin

// log_plugin.go
package main

import (
    "context"
    "fmt"
    "github.com/chassis/chassis/pkg/plugin"
)

type LogPlugin struct{}

func (lp *LogPlugin) PostStart(ctx context.Context, agent *plugin.Agent) error {
    fmt.Printf("[LOG] Agent %s started at %s\n", agent.Name, agent.StartTime)
    return nil
}

func (lp *LogPlugin) PreStop(ctx context.Context, agent *plugin.Agent) error {
    fmt.Printf("[LOG] Agent %s stopping\n", agent.Name)
    return nil
}

func main() {
    plugin.Register(&LogPlugin{})
}

Compile it and install it with chassis plugin install.


7. Real‑World Use Cases

| Use Case | Why chassis fits | |----------|-----------------| | IoT Edge Monitoring | Agents run on edge devices, need local storage, minimal overhead, and quick restarts. | | Data Collection Pipelines | Each collector runs as an agent, persists data locally before shipping. | | Automated Testing Agents | Lightweight test runners that run in containers, need isolated state. | | Distributed ML Workers | Each worker keeps a local cache of datasets. | | Security Scanners | Scanning agents that keep a local copy of vulnerability databases. |

Case Study: Telemetry Collector

A SaaS company needed to run hundreds of telemetry collectors on Kubernetes nodes without bloating the cluster. They opted for chassis because:

  • Resource Efficiency – Each collector used < 50 MiB of RAM, compared to 200 MiB when running as a sidecar in Kubernetes.
  • Simplified Rollouts – Upgrading collectors required only pulling a new image and restarting the container.
  • Local Storage – Collectors persisted logs to a local host directory that could be mounted on the node’s SSD, enabling fast disk I/O.
  • Observability – A simple monitoring plugin exposed each collector’s /metrics endpoint to Prometheus.

The result was a 30% reduction in node memory usage and a 15% decrease in log ingestion latency.


8. Security Considerations

| Concern | Recommendation | |---------|----------------| | Container Privileges | Run containers as non‑root users. chassis sets the user to chassis:chassis by default. | | Network Isolation | Use Docker’s --network flag to place agents on an isolated network. | | Secrets Management | Store secrets in a vault (e.g., HashiCorp Vault) and mount them as read‑only volumes. | | Filesystem Integrity | Use read‑only root filesystem (ro flag) except for the dedicated data mounts. | | Host Access | Mount host volumes with the ro flag unless mutable state is required. | | Runtime Security | Add AppArmor or SELinux profiles to enforce least‑privilege access. |


9. Performance Benchmarks

| Metric | chasisk vs Kubernetes (K8s) | Interpretation | |--------|------------------------------|----------------| | Start Time | 1.8 s vs 6.2 s | chasisk starts 3× faster because it doesn’t launch an init container or wait for a control plane. | | Memory Footprint | 260 MiB vs 850 MiB | chasisk’s Docker daemon + chassis binaries use < 300 MiB. | | CPU Utilization | 0.6 % vs 2.4 % | chasisk overhead is negligible. | | Agent Startup Latency | 1.2 s vs 3.1 s | The in‑memory overlay speeds up file I/O. | | Log Aggregation Latency | 10 ms vs 40 ms | chasisk forwards logs directly to the host, bypassing a sidecar. |

Note: The benchmarks were conducted on an Intel Xeon E5‑2676 v3 8‑core machine with 32 GiB RAM. The numbers scale linearly with the number of agents.

10. Migration Path

If you already run agents in Kubernetes or Docker Compose, moving to chassis is straightforward:

  1. Export Agent Configs – Convert your Kubernetes deployment YAMLs to chassis agent YAMLs. The key fields (image, env, resources) are compatible.
  2. Persist Data – Map your persistent volumes to chassis’s host directories (/var/chassis/<agent-id>).
  3. Adjust Resource Limits – Kubernetes’ CPU & memory requests map directly to Docker’s --cpus and --memory flags.
  4. Update Secrets – Use chassis secret commands or integrate with a vault.
  5. Deploy – Use chassis deploy agent -f to bring agents online.
  6. Verify – Check logs (chassis logs) and health endpoints (/healthz).

You can run both systems side‑by‑side during a transition period.


11. Future Roadmap

| Feature | Status | Notes | |---------|--------|-------| | Multi‑Cluster Support | In‑progress | Agents can belong to clusters identified by a cluster_id label. | | Horizontal Autoscaling | Planned | Triggered by CPU/Memory usage or custom metrics via plugins. | | Advanced Networking | Planned | Support for overlay networks and service discovery within a fleet. | | Integrated Scheduler | Planned | Basic scheduler to balance agents across hosts. | | UI Dashboard | Under development | A lightweight web UI for monitoring and configuration. | | Enterprise‑grade Security | Planned | Role‑based access control, audit logs, and SAML integration. |


12. Takeaway

chassis offers a compelling alternative to heavyweight orchestrators when your workload consists of long‑running, stateful agents that need isolated storage, minimal overhead, and simple management. By leveraging Docker’s built‑in features and a minimalist design philosophy, chassis reduces operational complexity while retaining the flexibility to scale, observe, and secure your fleet.

Bottom line: If you’re running dozens to hundreds of autonomous agents that must persist state locally and restart quickly, chassis gives you the “bare‑bones” orchestration you need without the bloat of Kubernetes. It’s Docker on steroids, but for agents.

Appendix: Quick Reference Commands

| Category | Command | Description | |----------|---------|-------------| | Fleet Management | chassis create fleet <name> | Create a new fleet. | | | chassis delete fleet <name> | Delete a fleet and its agents. | | Agent Lifecycle | chassis deploy agent -f <file.yaml> | Deploy a new agent. | | | chassis delete agent <id> | Stop and remove an agent. | | | chassis scale fleet <name> <count> | Scale a fleet to <count> agents. | | Observability | chassis logs <agent-id> | Tail logs for a specific agent. | | | chassis health <agent-id> | Check /healthz status. | | Plugins | chassis plugin list | List installed plugins. | | | chassis plugin install <url> | Install a plugin. | | Secrets | chassis secret create <name> --from-file <path> | Create a secret. | | | chassis secret delete <name> | Delete a secret. |


Resources

For a deeper dive into the implementation, check out the pkg/ directory in the repo. The code is heavily documented, and the design patterns follow idiomatic Go practices.

Read more