Naemon: Difference between revisions
(Created page with "'''Naemon''' has been identified as a simple and maintained solution for a Nagios-compatible monitoring system. Shinken and Sensu have been dismissed as open core solutions. == Naemon deployment and FreeBSD porting plan = == Overview == To improve Nasqueron infrastructure monitoring, we propose a three-step approach using Naemon, a Nagios-compatible monitoring system. Our 2024 test showed compatibility with FreeBSD is a reasonable middle-term goal, but need a sensibl...") |
(No difference)
|
Revision as of 23:35, 25 September 2025
Naemon has been identified as a simple and maintained solution for a Nagios-compatible monitoring system.
Shinken and Sensu have been dismissed as open core solutions.
= Naemon deployment and FreeBSD porting plan
Overview
To improve Nasqueron infrastructure monitoring, we propose a three-step approach using Naemon, a Nagios-compatible monitoring system.
Our 2024 test showed compatibility with FreeBSD is a reasonable middle-term goal, but need a sensible amount of work, especially for Thruk.
As we would like to seriously introduce monitoring in Fall 2025, we would suggest to start with a Linux solution, and carefully check and document what's working and what's not working on FreeBSD.
Hence a plan to ensure immediate monitoring coverage while preparing a long-term FreeBSD-compatible solution and potential upstream contributions.
Step 1: immediate monitoring
- Goal:* Deploy a working monitoring system immediately.
- Actions:*
- Spawn a Debian or Rocky VM dedicated to monitoring on hyper-001
- Install Naemon on the VM
- Deploy critical check scripts returning exit codes 0 = OK, 1 = WARNING, 2 = CRITICAL
- Configure notifications to IRC, e-mail, etc.
- Outcome:* Nasqueron services are monitored immediately with script-based alerts, without waiting for full FreeBSD compatibility.
- Who to involve?* Dorian is currently busy with DNS and acme certificates, that's a good opportunity to write and submit checks for those core critical systems. Philectro starts to be interested by automation, it's probably a good idea to create a binome Philecto / Dereckson for the deployment.
- Note:* We know to maintain a monitoring solution is not "Baguette magique ! All is monitored" but takes time. By "immediately" we state our intent to focus on create a solid monitoring role on rOPS to deploy naemon, livestatus and Thruk, have a documentation/plan to add checks and be able to run the checks we already have in the repository.
Step 2: FreeBSD Patch / Proof-of-Concept
- Goal:* Make Naemon suite fully functional on FreeBSD and create a working proof-of-concept.
- Actions:*
- Apply minimal portability patches locally (e.g., using `#!/usr/bin/env bash`, adjusting paths, POSIX compliance).
- Test the patches in a FreeBSD VM or jail environment.
- Organize a short hackathon (e.g., one week-end) on Naemon and FreeBSD IRC channels to:
- Test patches on different FreeBSD versions.
- Improve portability and compatibility.
- Encourage community participation in maintenance.
- Outcome:* A working FreeBSD-compatible Naemon suite, with volunteer feedback and testing, forming the basis for potential upstream contribution.
Step 3: Upstream Engagement
- Goal:* Collaborate with Naemon maintainers to upstream FreeBSD compatibility improvements.
- Actions:*
- Present the proof-of-concept FreeBSD patches to maintainers.
- Highlight testing results, community interest, and potential benefits of FreeBSD support.
- Discuss minimal, low-risk upstream changes (e.g., POSIX-compliant scripts, shebang adjustments).
- Outcome:* Upstream may accept patches or provide guidance, while Nasqueron keeps a safe internal FreeBSD-compatible fork if necessary.
- Note:* During the 2024 test, Dereckson has contributed upstream 3 fixes to ensure naemon and livestatus can be built with llvm/Clang and against FreeBSD headers. Maintainer was totally OK with C correctness, merged, less OK with shebang adjustments everywhere, and interested (1) to have an overview of Thruk (2) to work on FreeBSD compatibility for NRPE replacement, SNClient (probably a more interesting point than a web UI that could run on a dedicated VM and so on Linux, because NRPE must run on every machine, including other OSes).
Advantages
- Immediate monitoring coverage for critical services.
- Limited maintenance burden with local FreeBSD patching and volunteer testing.
- Potential for upstream contribution, increasing project portability and sustainability.
- Aligns with previous Nasqueron strategies of isolating legacy dependencies while modernizing infrastructure.
Notes
This plan emphasizes community engagement and practical progress. Steps 2 and 3 ensure that FreeBSD compatibility is addressed in a collaborative, low-risk manner without delaying production monitoring.
We consider a solution like Naemon a real need for "alerts", life in monitoring is not only timeseries. For us, Prometheus time series and Nagios-like alerts are not alternatives, but complete each other. Yes, CPU usage can be tracked as a time series easily, but a certificate expiration or a text warning perhaps less.