ServPulse/Note of Intent
Foreword
When a service is down or degraded, it's important to communicate efficiently with all the involved parties - Nasqueron members, visitors using our sites, Nasqueron operations SIG - about the status, what we know, what we do.
Cachet gave us satisfaction for a clear status page easy to use until 2018, where development stalled. James Brooks, Cachet author, sold Cachet to a company with plans for it, plans apparently never developed as far as we could see from repository development[1].
Meanwhile, Atlassian bought Statuspage SaaS and suddently, it seems everyone, from video games editor to Wikimedia, transitioned to it.
That's where ServPulse comes in: we don't want an essential communication feature to be monopolized by a private company. Essential tools should be open source, in line with our values.
ServPulse mission will be to provide a reliable, self-hostable open-source platform for publishing the status of services, components and infrastructure — enabling transparency, trust and streamlined incident communication for organisations of all sizes.
For those who don’t want to manage that piece of infrastructure, or who prefer to keep it external to their own systems, once ServPulse is released, open-source and free-culture projects will be welcome to use our ServPulse-managed service. Other providers, meanwhile, will be free to build their own SaaS solutions with it.
Scope of the first phase
As a first phase for ServPulse, the main scope is to offer a status‐page application.
This status-page application should support:
- A list of components (servers, services, clusters, what makes sense as unit for a specific infrastructure)
- Status for each component (e.g. operational, degraded, outage, under maintenance)
- Information about current incidents and their resolution
- Information about scheduled maintenance
- Subscriber notifications, by e-mail, webhook or SMS
That means an administration panel or configuration files need to support:
- Define service components (e.g., API, database, mobile app)
- Set the status for a specific component
- Create incidents
- Support a lifecycle/workflow: investigating → identified → monitoring → resolved
- Schedule maintenance windows
- Customize the page for branding: colors theme, custom logo, title, etc.
The application should support:
- Publish metrics directly usable by operations: uptime, response time, error rate
- Integration with monitoring/alerting tools, for automated updates via API
- Easy deployment for self-hosting, at least a Docker container
- a managed version for our own needs, and the needs of open-source/free culture projects
A lot of other ideas to monitor, automate, integrate with other tools have been discussed, but at first, we want a functional status page before exploring other areas.
Guiding principles and values
Transparency: Make it easy to publish accurate status information to all concerned.
Openness: Fully open-source, community-driven, extensible.
Simplicity: Setup and use should be easy — low barrier to entry.
Reliability: It should be robust and designed for production usage; status pages are especially critical during failures.
Flexibility: Allow customisation (branding, domain, UI), self-hosting or cloud-hosting, and integrations.
Cost-efficiency: Allow organisations of any size, including small teams/communities/hackerspaces, to adopt without heavy cost.
Community-centric: Encourage contributions, integrations, and localisations.
Target audience
ServPulse targets are:
- DevOps and SRE teams who want to publish service status transparently
- open-source and free culture projects (such as ours at Nasqueron) that want to self-host a status page
- organisations who may want self-hosted, customised status pages with full control
What could differentiate ServPulse
- Fully open source
- Designed to be modern and maintainable
- Focus on ease of deployment (Docker/K8s, minimal configuration)
- Modern UI/UX
- Built-in integrations with key systems (alerting, monitoring, IRC/Slack/Teams/webhooks) out-of-the-box
- Built for community, with extensibility (plugins, themes) and open governance
Risks and challenges
- Ensuring the project remains maintained and doesn’t suffer from stagnation
- Managing operational burden for users who self-host (documentation, updates, security)
- Creating sufficient differentiation from existing tools to attract users/contributors
- Balancing ease-of-use (for smaller teams) with advanced features (for larger infrastructure)
- Ensuring reliability and performance under real outage/incident conditions (ironically, the tool must be resilient when services are down)
Success metrics
- Number of organisations or projects using ServPulse (self-host or hosted)
- Number of contributors
- Integrations/plugin/theme ecosystem growth
- Feedback from users about ease of deployment and incident communication improvements
- Reduction in support requests during incidents
- Quality evaluation of community contributions, activity in repository, feature requests, forks
Notes
- ↑ See also https://github.com/orgs/cachethq/discussions/4342 for James explanation and plans for Cachet 3