incidents: add 012 about the ongoing random reboots of gw-core01
This commit is contained in:
parent
b5698a6c90
commit
b57200bd6c
|
@ -401,3 +401,23 @@ I added a `persistent_keepalive` to the tunnel to stop this from happening again
|
|||
* [ ] monitor connectivity for the public wifi (`blackbox exporter` in `client` network) and create alerting rules
|
||||
* [ ] prometheus instance on `eap-adp-jump01` to get alerts if upstream is down in facility
|
||||
* [ ] monitor wireguard state (probably needs a custom lua exporter)
|
||||
|
||||
|
||||
012: 2022.09.01 17:24, 18:10 | ongoing reboots of gw-core01
|
||||
-------------------------------------------------------------
|
||||
|
||||
Unfortunately zip tying back the protective cap of the power strip did not stop the random reboots of `gw-core01`.
|
||||
See incidents `001` and `010` for details.
|
||||
|
||||
Either the power supply or the device itself is broken.
|
||||
|
||||
**solution**:
|
||||
* [ ] replace power supply
|
||||
* [ ] replace device itself (if replacing the power supply does not work)
|
||||
|
||||
I tried replacing the power supply today (2022.09.01 ~20:00) but nobody could let me into the facilities.
|
||||
Going to try that again tommorrow.
|
||||
|
||||
**impact**:
|
||||
* 2022.09.01 17:24, 17:47
|
||||
* 2022.09.02 14:31, 18:10
|
||||
|
|
Reference in New Issue