Netdata, I've meant to look into Grafana but it always seemed way too overcomplicated and heavy for my purposes. Maybe one day, though...
Self-Hosted Main
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
For Example
- Service: Dropbox - Alternative: Nextcloud
- Service: Google Reader - Alternative: Tiny Tiny RSS
- Service: Blogger - Alternative: WordPress
We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.
Useful Lists
- Awesome-Selfhosted List of Software
- Awesome-Sysadmin List of Software
The fastest way? Probably netdata
This. If you have more servers you can also get them all connected to a single UI where you can see all the Infos at once. With netdata cloud
Just set this up yesterday. I used a parent node and then have all my vms point to that. Took like an hour to figure it out
Hey, did you use the cloud functionality or not? I'm tryna go all local with parent-child kind of capability but so far unable to.
The parent still is visible to the cloud portal. My understanding is the data all resides local, but when you login to their cloud portal, it connects to the parent to display the information. I’m still playing with it to confirm. My parent node shows all the child nodes on the local interface but the cloud still shows them all.
I’ll look into this too. Thank you.
I currently use thr classic "Hu seems slow, checks basic things like disk usage and process CPU/RAM usage I'll do a reboot to fix it for now".
This is me. Can't hurt to just do a reboot
Windows Server? )
Checkmk (Raw - free version.) Some setup aspects are a bit annoying (wants to monitor every last ZFS dataset and takes too long to 'ignore' them one by one.) It does alert me to things that could cause issues, like the boot partition almost full. I run it in a Docker container on my (primarily) file server.
I use this as well! Works well and has built in intelligence for thresholds.
I know that it needs a fix when my dad complaining that he can’t watch TV and the rolling door doesn’t open in the morning.
I personally use Influxdb , telegraf and grafana
Alerts are much more important than fancy dashboards. You won't be staring at your dashboard 24/7 and you probably won't be staring at it when bad things happen.
Creating your alert set not easy. Ideally, every problem you encounter should be preceded by corresponding alert, and no alert should be false positive (require no action). So if you either have a problem without being alerted from your monitoring, or get an alert which requires no action - you should sit down and think carefully what should be changed in your alerts.
As for tools - I recommend Prometheus+Grafana. No need for separate AletrManager, as many guides recommend, recent versions of Grafana have excellent built-in alerting. Don't use those ready-to-use dashboards, start from scratch, you need to understand PromQL to set everything up efficiently. Start with a simple dashboard (and alerts!) just for generic server health (node exporter), then add exporters for your specific services, network devices (snmp), remote hosts (blackbox), SSL certs etc. etc. Then write your own exporters for what you haven't found :)
One thing about using Prometheus alerting is that it’s one less link in the chain that can break, and you can also keep your alerting configs in source control. So it’s a little less “click-ops,” but easier to reproduce if you need to rebuild it at a later date.
When you have several Prometheus instances (HA or in different datacenters), setting up separate AlertManagers for each of them is a good idea. But as OP is only beginning his journey to monitoring, I guess he will be setting up a single server with both Prometheus and Grafana on it. In this scenario a separate AlertManager doesn't add reliability, but adds complexity.
As for source control, you can write a simple script using Grafana API to export alert rules (and dashboards as well) and push them to git. Not ideal, sure, but it will work.
Anyway, it's never too late to go further and add AlertManager, Loki, Mimir and whatever else. But to flatten the learning curve I'd recommend starting with Grafana alerts that are much more user-friendly.
Thank you for this. I think I need a deeper understanding of Prometheus. I’ll look into it. You are awesome
Good luck, if you get into it, you'll be unable to stop. Perfecting your monitoring system is a kind of mania :)
One more advice for another kind of monitoring. When you are installing / configuring something on your server - it's handy if you can monitor it's resource usage in real time. And that's why I use MobaXterm as my terminal program. It has many drawbacks, and competitors such as XShell, RoyalTS or Tabby look better in many ways... but it has one killer feature. It shows a status bar with current server load (CPU, RAM, disk usage, traffic) right below your SSH session, so that you don't have to switch to another window to see the effect of your actions. Saved me a lot of potential headache.
Alerts are much more important than fancy dashboards.
It depends, If you have to install lot of stuff or manage a lot of thing it's a good idea to have one but if you mainly do maintenance and you want to have something reliable yes you should have an alerts, for exemple I don't have a lot of thing install and doesn't rly care about reliability so I do everything in terminal, I use arch btw
Prometheus + Grafana, the same I use at my job.
Rainmeter if it's directly on their desktop/background.
Grafana. Have alerts set up and get data with node exporter and cadvisor with some other containers giving some metrics.
I have alerts setup and they just ping me on a discord server I setup. High cpu and temps low disk space memory things like that. Mostly get high CPU or temp alerts and that's usually when plex does its automated things at 4am.
I use Uptime Kuma to monitor particular services and NetData for server performance. I then pipe the alerts through to Pushover
Honestly my load is so light I don't bother monitoring performance. Uptime kuma for uptime, I used to use prtg and uptime robot when I ran a heavier stack before I switched to an all docker workload.
libreNMS is the tool I use, and it connects to systems primarily via SNMP (use v3, do not use v1 or v2c).
I don't check it all the time like a maniac but I have a glances docker running on my main server.
Glances is really nice. I've been using btop more recently though.
Influx/telegraf/grafana stack. I have all 3 on one server and then I put just telegraf on the others to send data into influx. Works great for monitoring things like usage. You can also bring in sysstat.
I have some custom apps as well where each time they run I record the execution time and peak memory in a database. This lets me go back over time and see where something improved or got worse. I can get a time stamp and go look at gitea commits to see what I was messing with.
I use Zabbix. Runs fine in a relatively small VM. Easy to write plugins.
If one of my users ever complained about anything I would possibly look into it, otherwise it all works so I don't waste life energy on that.
It is bit difficult at start, but really in the end you can monitor and get notification on anything thats happening on your system.
TICK stack is the only answer
When the fan gets loud enough to hear, I'll check it :P
I recommend Checkmk. https://checkmk.com/
I second CMK.
A TICK stack is unwieldy, Grafana takes a lot of setup, and all of this assumes you both know what to monitor and get stats on it.
CMK by contrast is plug and play. Install the server on a VM or host, install thr agent on your other systems, and you're good to go.
InfluxDB metrics server and Telegraf agent to collect metrics
I use sar for historical, my own scripts running under cron on the hosts for specific things I'm interested in keeping an eye on and my on scripts under cron on my monitoring machines for alerting me when something's wrong. I don't use a dashboard.
Zabbix. Aslo for Windows, it could be Rainmeter https://www.rainmeter.net/ or HWiNFO https://www.hwinfo.com/. For Linux, Conky.
I use Telegraf + InfluxDB + Grafana for monitoring my home network and systems. Grafana has a learning curve for building panels and dashboards, but is incredibly flexible. I use it for more than server performance. I have a dual-monitor "kiosk" (old Mac mini) in my office displaying two Grafana dashboards. These are:
Network/Power/Storage showing:
- firewall block events & sources for last 12 hrs (from pfSense via Elasticsearch),
- current UPS statuses and power usage for last 12 hrs (Telegraf apcupsd plugin -> InfluxDB),
- WAN traffic for last 12 hrs ( from pfSense via Telegraf -> InfluxDB),
- current DHCP clients (custom Python script -> MySQL), and
- current drive and RAID pool health (custom Python scripts -> MySQL)
Server sensors and performance showing:
- current status of important cron jobs (using Healthchecks -> Prometheus),
- current server CPU usage and temps, and memory usage (Telegraf -> InfluxDB)
- server host CPU usage and temps, and memory usage for last 3 hrs (Telegraf -> InfluxDB)
- Proxmox VM CPU and memory usage for last 3 hrs (Proxmox -> InfluxDB)
- Docker container CPU and memory usage for last 3 hrs (Telegraf Docker plugin -> InfluxDB)
Netdata works really well for system performance for Linux and can be installed from the default repositories of major distributions.
We use zabbix here. Zabbix is amazing and we put it in all of our templates so any new servers and hosts pop up on zabbix dashboard preconfigured just like that. For logs and security we use an Elastik "ELK stack" which gives us a heads up if anything is wrong in the logs, and zabbix gives us a head up of the systems health all together. Between the two, our health monitor panel combines the two windows so we can see full server health and any problems right there as a todo list for the IT team
I use Home Assistant already. They have a plugin for glances. I guess all I'm interested in is cpu temp and load. Any changes =somethings up
If get ahead of it by getting extra.
Need 16 gb of ram and 8 cores ? Well let me add 64 gb to my cart and 12 core CPU.
Hasn’t failed me
CheckMK for general monitoring, Grafana/Prometheus for Proxmox-cluster, Wazuh for IDS-purposes and UptimeKuma for general uptime on services. It's not like it's necessary, but it's nice to tinker in my homelab before implementing the same services on a "professional level" at work.
My HomeAssistant is stable, so wifey is not being used as a monitor ;-)
I came across monit
recently, seems nice
I use btop, I use arch btw
If nobody complains everything is fine.
I run music bots game servers mostly so even if something fails it‘s nothing really that critical.
When I‘m at home I usually ssh into my main host machine and have btop running on my second monitor. It shows me the processes, ram , cpu, network and disk space. Oh yeah and load averages. It also looks super pretty and supports skins :)
I don't find it valuable so I don't. (Maybe run top
as needed.)