Naemon

From Nasqueron Agora

Naemon has been identified as a simple and maintained solution for a Nagios-compatible monitoring system.

Shinken and Sensu have been dismissed as open core solutions.

= Naemon deployment and FreeBSD porting plan

Overview

To improve Nasqueron infrastructure monitoring, we propose a three-step approach using Naemon, a Nagios-compatible monitoring system.

Our 2024 test showed compatibility with FreeBSD is a reasonable middle-term goal, but need a sensible amount of work, especially for Thruk.

As we would like to seriously introduce monitoring in Fall 2025, we would suggest to start with a Linux solution, and carefully check and document what's working and what's not working on FreeBSD.

Hence a plan to ensure immediate monitoring coverage while preparing a long-term FreeBSD-compatible solution and potential upstream contributions.

Step 1: immediate monitoring

  • Goal:* Deploy a working monitoring system immediately.
  • Actions:*
  • Spawn a Debian or Rocky VM dedicated to monitoring on hyper-001
  • Install Naemon on the VM
  • Deploy critical check scripts returning exit codes 0 = OK, 1 = WARNING, 2 = CRITICAL
  • Configure notifications to IRC, e-mail, etc.
  • Outcome:* Nasqueron services are monitored immediately with script-based alerts, without waiting for full FreeBSD compatibility.
  • Who to involve?* Dorian is currently busy with DNS and acme certificates, that's a good opportunity to write and submit checks for those core critical systems. Philectro starts to be interested by automation, it's probably a good idea to create a binome Philecto / Dereckson for the deployment.
  • Note:* We know to maintain a monitoring solution is not "Baguette magique ! All is monitored" but takes time. By "immediately" we state our intent to focus on create a solid monitoring role on rOPS to deploy naemon, livestatus and Thruk, have a documentation/plan to add checks and be able to run the checks we already have in the repository.

Step 2: FreeBSD Patch / Proof-of-Concept

  • Goal:* Make Naemon suite fully functional on FreeBSD and create a working proof-of-concept.
  • Actions:*
  • Apply minimal portability patches locally (e.g., using `#!/usr/bin/env bash`, adjusting paths, POSIX compliance).
  • Test the patches in a FreeBSD VM or jail environment.
  • Organize a short hackathon (e.g., one week-end) on Naemon and FreeBSD IRC channels to:
    • Test patches on different FreeBSD versions.
    • Improve portability and compatibility.
    • Encourage community participation in maintenance.
  • Outcome:* A working FreeBSD-compatible Naemon suite, with volunteer feedback and testing, forming the basis for potential upstream contribution.

Step 3: Upstream Engagement

  • Goal:* Collaborate with Naemon maintainers to upstream FreeBSD compatibility improvements.
  • Actions:*
  • Present the proof-of-concept FreeBSD patches to maintainers.
  • Highlight testing results, community interest, and potential benefits of FreeBSD support.
  • Discuss minimal, low-risk upstream changes (e.g., POSIX-compliant scripts, shebang adjustments).
  • Outcome:* Upstream may accept patches or provide guidance, while Nasqueron keeps a safe internal FreeBSD-compatible fork if necessary.
  • Note:* During the 2024 test, Dereckson has contributed upstream 3 fixes to ensure naemon and livestatus can be built with llvm/Clang and against FreeBSD headers. Maintainer was totally OK with C correctness, merged, less OK with shebang adjustments everywhere, and interested (1) to have an overview of Thruk (2) to work on FreeBSD compatibility for NRPE replacement, SNClient (probably a more interesting point than a web UI that could run on a dedicated VM and so on Linux, because NRPE must run on every machine, including other OSes).

Advantages

  • Immediate monitoring coverage for critical services.
  • Limited maintenance burden with local FreeBSD patching and volunteer testing.
  • Potential for upstream contribution, increasing project portability and sustainability.
  • Aligns with previous Nasqueron strategies of isolating legacy dependencies while modernizing infrastructure.

Notes

This plan emphasizes community engagement and practical progress. Steps 2 and 3 ensure that FreeBSD compatibility is addressed in a collaborative, low-risk manner without delaying production monitoring.

We consider a solution like Naemon a real need for "alerts", life in monitoring is not only timeseries. For us, Prometheus time series and Nagios-like alerts are not alternatives, but complete each other. Yes, CPU usage can be tracked as a time series easily, but a certificate expiration or a text warning perhaps less.