I don't track their performance, I just track if they're up or down.
I use uptimekuma running on a free tier of fly.io so I can tell if my cluster had a catastrophic failure. There's no point in the alerting system running on the same system.
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
For Example
We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.
Useful Lists
I don't track their performance, I just track if they're up or down.
I use uptimekuma running on a free tier of fly.io so I can tell if my cluster had a catastrophic failure. There's no point in the alerting system running on the same system.
Zabbix for hardware, certificate monitoring
Prometheus for service monitoring (e.g how many are actually using my Jellyfin server, so i know if I need to scale etc.)
Observium..
If it's just one server, Netdata is a better option..
First for PRTG.
Nagios for service/QOS, Grafana for dashboarding for some items more specific. Planning on eventually switching to zabbix but nagios is so simple that i feel having a hard time justifying moving over 400 monitored services to it
If its down, I assume performance is bad
Quick checks: Proxmox dashboard, htop or glances, Portainer
Extensive monitoring: Prometheus (node-exporter), Rsyslog server, Loki, Grafana, Uptime Kuma, Alertmanager (via Gotify)
I literally tried all. Nagios is the best one
Uptime Kuma for my services Netdata + Prometheus + Grafana for server health (alerts and visualization)
Prometheus and grafana
I use net data for both dashboards and alerts. Works great and easy to setup.
Its not well liked but I use nagios core for alerts and jump to grafana which has data in prometheus, influxdb, and mysql backend for trends like cpu usage hard drive Temps etc.
Oh lord, I have so much info to give ! For the setup, it's running on kubernetes 1.28.2, so YMMV. My monitoring stack is :
The rest is pretty much the same, if the service exports prometheus metrics by default, I use that, and write a ServiceMonitor
and a Service
manifest for that, it usually looks like that
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: traefik
labels:
app.kubernetes.io/component: traefik
app.kubernetes.io/instance: traefik
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: traefik
app.kubernetes.io/part-of: traefik
spec:
selector:
matchLabels:
app.kubernetes.io/name: traefik-metrics
endpoints:
- port: metrics
interval: 30s
path: /metrics
scheme: http
tlsConfig:
insecureSkipVerify: true
namespaceSelector:
matchNames:
- traefik
apiVersion: v1
kind: Service
metadata:
name: traefik-metrics
namespace: traefik
labels:
app.kubernetes.io/name: traefik-metrics
spec:
type: ClusterIP
ports:
- protocol: TCP
name: metrics
port: 8082
selector:
app.kubernetes.io/name: traefik
If the app doesn't include a prometheus endpoint, I just find an existing exporter for that app, most popular ones have that, and ready made grafana dashboards.
For alerting, I create PrometheusRule
object with the prometheus query and the message to alert me (depending on the severity, it's either a mail for med-low severity incidents, phone notification for high sev). I try to keep mails / notifications to a minimum, just alerts on load, CPU, RAM, and potential SMART errors as well give me alerts.
Use PRTG, up until 100 sensors it’s free.
Best Monitoring tool ever ☝🏻🙂
None. There is no need for a performance monitor for my home lab. I just have an alert if one of my main three services is down. That is all i need.
Glances, uptime-kuma, and back end script that reboots service if down. If it doesn't work I get a notification via gotify. Simple and sweet
Netdata, monitoring a few thousand servers (virtual) that way.
Girlfriend first Alert Manager second. Girlfriend is usually faster.
I just check the proxmox dashboard every now and then. Honestly if everything is working I'm not too worried about exact ram levels at any given moment
Uptime Kuma and Grafana. Uptime Kuna to monitor if a service is up and running and Grafana to monitor the host like CPU, RAM, SSD usage etc.
Thank you for this. I appreciate the support.
Same here, also have some autoscaling mechanisms set up in docker swarm to scale certain services in case the load is high
Just to make sure: You are aware that a search option here exists, yes? And you keep refusing to use it for whatever reason?
I use checkmk with notifications to a telegram bot