Herding Services and Herons: Microservices in the Wild

Table of Contents

Heron Project - This article is part of a series.

Part 1: Object Detection vs. The Gray Heron

Part 2: This Article

Welcome Back to Heron Watch 🕵️‍♂️🐦
#

In the last post, I introduced my slightly ridiculous but fully serious attempt to defend my garden pond from a feathered fish thief. Long story short: AI detected gray heron + sprinkler system = nobody touches the koi.

👈 Previously in the Heron Saga…

Object Detection vs. The Gray Heron

2 April 2025·707 words

Ai Computer Vision Object Detection Yolo Personal Project Wildlife Heron No More Fish Theft

But this time, we’re going under the hood.

This blog entry is all about the architecture—the system of microservices that work together behind the scenes to catch the heron in action. It’s over-engineered in the most delightful way and also happens to be a great demonstration of distributed system design in a real-world (well, backyard) use case.

Let’s break it down.

🧩 The Big Picture
#

Here’s the full high-level architecture:

flowchart TB %% ─────────────────────────────────────────── %% 1. Capture FrameCapture["frame-capture-svc"] FrameCapture -->|publishes JPEGs| raw_frames %% ─────────────────────────────────────────── %% 2. RabbitMQ queues subgraph RabbitMQ["RabbitMQ Broker"] direction TB raw_frames["raw_frames"] detections["detections"] predicted_frames["predicted_frames"] end %% ─────────────────────────────────────────── %% 3. Orchestration Orchestrator["orchestrator-svc"] raw_frames --> Orchestrator Orchestrator -->|MQTT trigger| MQTT MQTT["MQTT Broker
(Home Assistant)"] %% ─────────────────────────────────────────── %% 4. Inference Orchestrator -->|POST image| Inference Inference["inference-svc"] Inference -->|JSON results| Orchestrator Orchestrator -->|publishes detections| detections Orchestrator -->|publishes predicted JPEGs| predicted_frames %% ─────────────────────────────────────────── %% 5. Clip extraction ClipExtractor["clip-extractor-svc"] detections --> ClipExtractor %% ─────────────────────────────────────────── %% 6. Clip buffer & storage ClipBuffer["clip-buffer-svc"] ClipBuffer -->|maintains ring| Buffer subgraph Storage["Storage"] direction TB Buffer["/buffer (ring segments)"] Clips["/clips (saved MP4s)"] end Buffer --> ClipExtractor ClipExtractor -->|writes clips| Clips Clips --> Dashboard %% ─────────────────────────────────────────── %% 7. Dashboard & user subgraph DashboardGrp["Dashboard"] direction TB predicted_frames --> Dashboard Dashboard["dashboard-svc"] Dashboard -->|serves live + clips| User User["User / Browser"] end

This may look like a wall of arrows, but fear not. We’re going to zoom into each part and explore the services that make this backyard watchdog magic happen.

📨 Why RabbitMQ?
#

Before we dive into the service specifics, let’s talk message brokers. In a microservices setup, different components need to communicate efficiently without yelling across the room. That’s where RabbitMQ hops in (pun 100% intended).

RabbitMQ is a lightweight, battle-tested message broker that allows asynchronous communication between services. One service can publish a message (e.g., a new frame), and another can pick it up and do something useful (e.g., run inference, store it, ignore it).

In this project, RabbitMQ is the glue between:

The camera frame grabber
The AI inference engine
The detection pipeline
The clip extraction logic
The dashboard UI
And anything else that needs event-based comms

RabbitMQ supports features like persistent queues, routing keys, and fanout exchanges, which make it ideal for this kind of loosely coupled, event-driven architecture.

🛠️ Could I have used MQTT instead? Yes, totally. In fact, for smaller projects or pure smart home use, MQTT alone could do the job — especially with things like MQTT Image. But I saw this as a great opportunity to showcase RabbitMQ as a proper message broker with queues and fanout and not just a pub/sub pipe.
🚀 What about Kafka + MinIO? If this were a large-scale surveillance system with thousands of cameras and multi-TB video buffers, I’d reach for Apache Kafka and store the footage on MinIO or S3. But this is a backyard project with exactly one IP cam and one bird.

📸 frame-capture‑svc: The Watchful Eye
#

This service does one job — and does it with laser focus (or… lens focus?). It connects to an RTSP-enabled IP camera, pulls JPEG snapshots at a configured FPS, and tosses them into the raw_frames RabbitMQ queue for the rest of the pipeline to handle.

Think of it as the watchtower of the system — always vigilant, always streaming.

Here’s how it’s configured in the docker-compose.yml:

frame-capture-svc:
  build: ./frame-capture-svc
  environment:
    - RTSP_URL=rtsp://user:password@ip_address:554/live/ch1
    - RABBIT_HOST=rabbitmq
    - FPS=2
  volumes:
    - ./frame-capture-svc/frames:/app/frames
  depends_on:
    - rabbitmq

🔍 A few details
#

RTSP_URL points to the IP camera stream.
RABBIT_HOST tells it where RabbitMQ lives.
FPS=2 means we’re capturing two frames per second — plenty for our birdy burglar.
The volume mount is just for debugging — you can see what frames it actually captured.

🧠 inference‑svc: The Brain
#

Once a frame is captured, the orchestrator-svc (we’ll get to that) sends it over to inference-svc, which wraps around a YOLOv8 ONNX model. This is where object detection happens.

The service returns a clean JSON with detected objects, classes, bounding boxes, and confidence scores. It’s build using FastAPI + Uvicorn + ONNX for a snappy performance.

inference-svc:
  build: ./inference-svc
  volumes:
    - ./inference-svc/model:/app/model
  environment:
    - RABBIT_HOST=rabbitmq
    - MODEL_PATH=/app/model/best.onnx
    - CONF_THRESHOLD=0.6
  depends_on:
    - rabbitmq

🔍 A few details
#

MODEL_PATH points to the ONNX model file.
CONF_THRESHOLD is the confidence threshold for detections (0.6 means 60% confidence).
RABBIT_HOST of course we need that RabbitMQ host again.
The volume mount is where the model file lives.
The model is loaded once and reused for each request, making it efficient.

Why a separate service? Because:

We want to scale inference independently (add more instances if needed)
It keeps the model logic isolated and testable
It enables us to swap models easily (e.g., YOLOv4, YOLOv7, YOLOv9, etc.)

🕹 orchestrator‑svc: The Puppet Master
#

The heart of the system.

It orchestrates everything by:

Consuming frames from the raw_frames queue
Sending the frames to the inference-svc for detection
Publishing the predicted frames to the predicted_frames queue
Checking the detection results and if a class of interest is detected:
- Publishing the detection results to the detections queue
Sending an MQTT trigger to Home Assistant (or any other system)

Think of it as the conductor in our bird-detecting orchestra.

orchestrator-svc:
  build: ./orchestrator-svc
  environment:
    - RABBIT_HOST=rabbitmq
    - INFERENCE_URL=http://inference-svc:8000/detect
    - DETECTION_CLASSES=heron # or list: heron,person,dog
    - MQTT_HOST=homeassistant.fritz.box
    - MQTT_PORT=1883
    - MQTT_USERNAME=admin
    - MQTT_PASSWORD=redacted
  depends_on:
    - rabbitmq

🔍 A few details
#

RABBIT_HOST is the RabbitMQ broker.
INFERENCE_URL is the URL of the inference service.
DETECTION_CLASSES is a comma-separated list of classes to detect (e.g., heron,person,dog).
MQTT_HOST is the MQTT broker (Home Assistant in my case).
MQTT_PORT, MQTT_USERNAME, and MQTT_PASSWORD are for connecting to the MQTT broker.
The depends_on ensures RabbitMQ is up before starting the orchestrator.
The orchestrator is the only service that needs to know about the MQTT broker, so it’s the only one that has those credentials.

🌀 clip-buffer‑svc: The Ringmaster
#

This service is continuously recording small segments (like 5-second HLS chunks) and keeping them in a ring buffer stored in /buffer.

The idea is: even before a heron is detected, the footage is already waiting in a rolling loop. This way, we don’t miss the “approach” part of the action.

Ring buffer + fast access = juicy pre-roll clips.

  clip-buffer-svc:
    build: ./clip-buffer-svc
    environment:
      - RTSP_URL=rtsp://user:password@ip_address:554/live/ch1
    volumes:
      - buffer:/buffer
    depends_on:
      - rabbitmq

🔍 A few details
#

Not much to say here:

RTSP_URL is the same as before.
/buffer is a Docker volume that stores the ring segments.
It’s built using FFmpeg to handle the video stream.
Each segment is named after the timestamp it was recorded, so we can easily find the right one later.

✂️ clip-extractor‑svc: The Editor
#

When the orchestrator detects a heron, it publishes a message to the detections queue. The clip-extractor-svc consumes messages form this queue and fetches the relevant segments from the ring buffer.

It has a 10 second pre-roll and a 10 second post-roll, so it captures the whole action. It waits after the detection to ensure the heron is gone before stopping the recording. It stitches them together into a single MP4 file using FFmpeg and saves it in /clips. After the clip is saved, it generates a thumbnail for the clip for the dashboard.

flowchart TD detections["RabbitMQ: detections"] --> ClipExtractor["clip-extractor-svc"] Buffer["/buffer (ring segments)"] --> ClipExtractor ClipExtractor --> Clips["/clips (saved MP4s)"]

clip-extractor-svc:
  build: ./clip-extractor-svc
  environment:
    - RABBIT_HOST=rabbitmq
    - BUFFER_DIR=/buffer
    - OUTPUT_DIR=/clips
  volumes:
    - buffer:/buffer
    - clips:/clips
  depends_on:
    - rabbitmq
    - clip-buffer-svc

🔍 A few details
#

RABBIT_HOST is the RabbitMQ broker.
BUFFER_DIR is the path to the ring buffer inside the container.
OUTPUT_DIR is the path to save the clips inside the container.
The volumes mount the buffer and clips directories.
The depends_on ensures the clip buffer is up before starting the clip extractor.

Yes, the heron now stars in its own surveillance reels.

📊 dashboard‑svc: The Control Center
#

A Flask + Socket.IO dashboard that:

Streams live frames
Shows the latest detections
Provides access to saved clips
Displays logs and status of services

Perfect for watching the heron get mildly inconvenienced in real-time.

dashboard-svc:
  build: ./dashboard-svc
  ports:
    - "5000:5000"
  environment:
    - RABBIT_HOST=rabbitmq
    - CLIPS_DIR=/clips
  volumes:
    - clips:/clips
  depends_on:
    - rabbitmq

🔍 A few details
#

RABBIT_HOST is of course again the RabbitMQ broker.
CLIPS_DIR is the path where to find the saved clips inside the container.
The volumes mount the clips directory.
The ports expose the dashboard on port 5000.
The depends_on ensures RabbitMQ is up before starting the dashboard.

🧩 System Design Insights
#

Wondering about the setup? Each service runs in its own Docker container, defined in a docker-compose.yml. They talk to each other via RabbitMQ and are designed to be modular—swap in a new model, add more cameras, or update a service without touching the rest.

Environment variables keep everything configurable without changing code—via a .env file or directly in docker-compose.

During development, you don’t need containers at all. Just run services locally with defaults. Only the RTSP stream needs a real or virtual camera, but I’m working on a mocked RTSP setup using a sample video and two lightweight containers.

🧑‍💻 User Experience
#

All this finally reaches the user—me—via a web dashboard. From here, I can review detections, debug the system, or just enjoy the drama of wildlife surveillance with a fresh coffee.

Dashboard Screenshot — The dashboard in action, showing live frames and detection logs.

🧠 Why Microservices?
#

You might ask, “Why not just write a monolith?”

Great question. Here’s why I went full microservice mode:

Modularity: Each service is simple and focused
Scalability: Run inference-svc on a beefier machine if needed
Fault Isolation: One crash doesn’t take the whole system down
Asynchronicity: RabbitMQ helps decouple everything cleanly

Also… because it’s fun.

Stay tuned for future posts in this series:

Annotating with style (and speed)
Live deployment tips
Dashboard UI & integration tricks

Until then: detect early, spray responsibly.

Fun fact: Herons hate sudden loud noises. Turns out they don’t YOLO—they flinch.

Heron Project - This article is part of a series.

Part 1: Object Detection vs. The Gray Heron

Part 2: This Article

Welcome Back to Heron Watch 🕵️‍♂️🐦#

👈 Previously in the Heron Saga…

🧩 The Big Picture#

📨 Why RabbitMQ?#

📸 frame-capture‑svc: The Watchful Eye#

🔍 A few details#

🧠 inference‑svc: The Brain#

🔍 A few details#

🕹 orchestrator‑svc: The Puppet Master#

🔍 A few details#

🌀 clip-buffer‑svc: The Ringmaster#

🔍 A few details#

✂️ clip-extractor‑svc: The Editor#

🔍 A few details#

📊 dashboard‑svc: The Control Center#

🔍 A few details#

🧩 System Design Insights#

🧑‍💻 User Experience#

🧠 Why Microservices?#

Welcome Back to Heron Watch 🕵️‍♂️🐦
#

🧩 The Big Picture
#

📨 Why RabbitMQ?
#

📸 frame-capture‑svc: The Watchful Eye
#

🔍 A few details
#

🧠 inference‑svc: The Brain
#

🔍 A few details
#

🕹 orchestrator‑svc: The Puppet Master
#

🔍 A few details
#

🌀 clip-buffer‑svc: The Ringmaster
#

🔍 A few details
#

✂️ clip-extractor‑svc: The Editor
#

🔍 A few details
#

📊 dashboard‑svc: The Control Center
#

🔍 A few details
#

🧩 System Design Insights
#

🧑‍💻 User Experience
#

🧠 Why Microservices?
#