Commit Graph

11 Commits

Author SHA1 Message Date
Gregor Michels eadcf6f296 monitoring: extend ifInErrors alert to non-snmp devices
also automatically clear alarm after 2 hours because linux devices have
no way to clear the nic error counters
2023-04-18 21:00:04 +02:00
Gregor Michels 2299e3aff1 monitoring: make summary and description for snmp alarms more verbose 2023-03-23 00:07:23 +01:00
Gregor Michels d1c1f34bf8 monitoring: alert on snmp if{In,Out}Errors 2023-03-22 23:53:39 +01:00
Gregor Michels 0475923590 alerting: only alarm on devices that are unreachable for 1m at least 2022-12-22 16:37:15 +01:00
Gregor Michels 69834a8d2b alerting: also alert on reboots of snmp devices 2022-12-22 16:37:15 +01:00
Gregor Michels e3b111f2c7 monitoring: monitor switches in the ANS via snmp 2022-11-21 02:58:13 +01:00
Gregor Michels 9cfee1f384 monitoring: add alerting rules for disks running out of space 2022-11-19 01:58:14 +01:00
Gregor Michels 8389a18488 monitoring: move prometheus stack onto eae-adp-jump01
to be able to also monitor the new site.

custom grafana dashboard broke while transfering stack.
will fix next
2022-11-17 00:35:57 +01:00
Gregor Michels ec917a24c6 monitoring: add alarm "PublicWifiUpstreamLost" 2022-10-19 02:05:32 +02:00
Gregor Michels 6623cc0e09 monitoring: alert on node reboots 2022-09-14 02:16:15 +02:00
Gregor Michels 5a21b2cd88 monitoring: prometheus: add simple alerting rule 2022-07-13 01:27:07 +02:00