incidents: add maintenance incident 021 - replace gw-core01

This commit is contained in:
Gregor Michels 2022-10-10 23:17:05 +02:00
parent f83792749e
commit 0e838e7dc1
1 changed files with 25 additions and 0 deletions

View File

@ -995,3 +995,28 @@ This means that there was complete wifi downtime for about 10 minutes.
**update**:
After the power outage (see `incident 018` for details) `ap-0b99` was reachable again.
Therefore I've upgraded the ap on 2022.09.28 from 00:21 till 00:27.
021 2022.09.29 10:30 - 11:30 | (maintenance) replace gw-core01, reorg cabling
-----------------------------------------------------------------------------
To finally combat the random reboots of gw-core01 (see incidents `010`and `012` for details) I've replaced the device again.
The last time I tried to replace `gw-core01` the replacement device stopped working after 4 hours (see incident `015` for details).
Because the original and replacement device have the same SoC (`Mediatek MT7621`) this smells like an OS issue.
There a few OpenWrt forum entries about people having issues with the `MT7621` on kernel `5.10`.
Therefore this replacement was done using an `x64` based platform (`Sophos SG-125r2`).
After replacing the device I also replugged some switch ports on `sw-access01` to bring some order into the network cables.
**firmware and configuration of gw-core01**:
* firmware: garet commit `ce38181`, garet profile `sophos-sg-125r2_22.03.0`
* `playbook_provision_gateway.yml`: `e7054c1` ported the config of `gw-core01` onto the new platform.
* synced dhcp leases via `scp` between devices
**timetable**:
* 10:53: transferred all ports to the new `gw-core01`
* 11:00: replugged `ap-8f42` (`tent 1`) to reorg network cable (=> ap reboot)
* 11:02: replugged `sw-acces02` (`tent 5`) to reorg network cable (=> only short link interruption)
* 11:14: shutdown `hyper01` to reorg power cord (=> shutdown all vms)