Gregor Michels
|
eadcf6f296
|
monitoring: extend ifInErrors alert to non-snmp devices
also automatically clear alarm after 2 hours because linux devices have
no way to clear the nic error counters
|
2023-04-18 21:00:04 +02:00 |
Gregor Michels
|
2299e3aff1
|
monitoring: make summary and description for snmp alarms more verbose
|
2023-03-23 00:07:23 +01:00 |
Gregor Michels
|
d1c1f34bf8
|
monitoring: alert on snmp if{In,Out}Errors
|
2023-03-22 23:53:39 +01:00 |
Gregor Michels
|
0475923590
|
alerting: only alarm on devices that are unreachable for 1m at least
|
2022-12-22 16:37:15 +01:00 |
Gregor Michels
|
69834a8d2b
|
alerting: also alert on reboots of snmp devices
|
2022-12-22 16:37:15 +01:00 |
Gregor Michels
|
e3b111f2c7
|
monitoring: monitor switches in the ANS via snmp
|
2022-11-21 02:58:13 +01:00 |
Gregor Michels
|
9cfee1f384
|
monitoring: add alerting rules for disks running out of space
|
2022-11-19 01:58:14 +01:00 |
Gregor Michels
|
8389a18488
|
monitoring: move prometheus stack onto eae-adp-jump01
to be able to also monitor the new site.
custom grafana dashboard broke while transfering stack.
will fix next
|
2022-11-17 00:35:57 +01:00 |
Gregor Michels
|
ec917a24c6
|
monitoring: add alarm "PublicWifiUpstreamLost"
|
2022-10-19 02:05:32 +02:00 |
Gregor Michels
|
6623cc0e09
|
monitoring: alert on node reboots
|
2022-09-14 02:16:15 +02:00 |
Gregor Michels
|
5a21b2cd88
|
monitoring: prometheus: add simple alerting rule
|
2022-07-13 01:27:07 +02:00 |