add incident 028: peridically restart prometheus on eae-adp-jump01

incident 027: remembered that I also sysupgraded eae-adp-jump01
Fixes: 34e4fbf000
2022-12-23 01:28:30 +01:00 · 2022-12-23 01:27:00 +01:00
1 changed files with 19 additions and 1 deletions
--- a/documentation/INCIDENTS.md
+++ b/documentation/INCIDENTS.md
@ -1184,7 +1184,8 @@ After installing a prometheus stack onto `eae-adp-jump01` (`8389a18`) the `/var/
 Limiting the size of the TSDB did not resolve this issues (maybe i've misconifigured the limit).

 **solution**:
-attach 20GB block device onto vm and mount it as `/var/prometheus`:
+* `sysupgrade` to `OpenBSD 7.2`
+* attach 20GB block device onto vm and mount it as `/var/prometheus`:
 ```
 eae-adp-jump01# rcctl stop prometheus
 eae-adp-jump01# rm -r /var/prometheus/*
@ -1210,3 +1211,20 @@ eae-adp-jump01# rcctl start prometheus
 eae-adp-jump01# syspatch
 eae-adp-jump01# reboot
 ```
+
+
+028 2022.11.29 02:00 | periodically restart prometheus
+------------------------------------------------------
+
+**problem**:
+`prometheus` crashed regularly on `eae-adp-jump01`.
+It seems like `OpenBSD` is missing some functionality on file handles that let's `prometheus` crash.
+Here is an [github issue](https://github.com/prometheus/prometheus/issues/8799) (for an older `OpenBSD` release) that descripes the same problems.
+
+**solution**:
+until I've got time to install a new linux machine somewhere that does the monitoring: regularly restart `prometheus`:
+```
+eae-adp-jump01# crontab -e
+[...]
+0	*/2	*	*	*	rcctl restart prometheus
+```
Author	SHA1	Message	Date
Gregor Michels	9506e94dad	add incident 028: peridically restart prometheus on eae-adp-jump01	2022-12-23 01:28:30 +01:00
Gregor Michels	3e2fc42c19	incident 027: remembered that I also sysupgraded eae-adp-jump01 Fixes: `34e4fbf000`	2022-12-23 01:27:00 +01:00