Compare commits

...

2 Commits

Author SHA1 Message Date
Gregor Michels 0e838e7dc1 incidents: add maintenance incident 021 - replace gw-core01 2022-10-10 23:17:05 +02:00
Gregor Michels f83792749e update todos 2022-10-08 16:42:02 +02:00
2 changed files with 43 additions and 5 deletions

View File

@ -995,3 +995,28 @@ This means that there was complete wifi downtime for about 10 minutes.
**update**:
After the power outage (see `incident 018` for details) `ap-0b99` was reachable again.
Therefore I've upgraded the ap on 2022.09.28 from 00:21 till 00:27.
021 2022.09.29 10:30 - 11:30 | (maintenance) replace gw-core01, reorg cabling
-----------------------------------------------------------------------------
To finally combat the random reboots of gw-core01 (see incidents `010`and `012` for details) I've replaced the device again.
The last time I tried to replace `gw-core01` the replacement device stopped working after 4 hours (see incident `015` for details).
Because the original and replacement device have the same SoC (`Mediatek MT7621`) this smells like an OS issue.
There a few OpenWrt forum entries about people having issues with the `MT7621` on kernel `5.10`.
Therefore this replacement was done using an `x64` based platform (`Sophos SG-125r2`).
After replacing the device I also replugged some switch ports on `sw-access01` to bring some order into the network cables.
**firmware and configuration of gw-core01**:
* firmware: garet commit `ce38181`, garet profile `sophos-sg-125r2_22.03.0`
* `playbook_provision_gateway.yml`: `e7054c1` ported the config of `gw-core01` onto the new platform.
* synced dhcp leases via `scp` between devices
**timetable**:
* 10:53: transferred all ports to the new `gw-core01`
* 11:00: replugged `ap-8f42` (`tent 1`) to reorg network cable (=> ap reboot)
* 11:02: replugged `sw-acces02` (`tent 5`) to reorg network cable (=> only short link interruption)
* 11:14: shutdown `hyper01` to reorg power cord (=> shutdown all vms)

View File

@ -2,11 +2,13 @@
## Software
* [ ] add monitoring vm
* [x] add monitoring vm
* replace `prometheus-node-exporter-lua-hostapd_stations` with an exporter that does not collect mac addresses!
* [ ] put aps on non overlapping wifi channels
* [ ] document configuration of `gw-core01`
* [ ] provision config of `gw-core01` via ansible (network, firewall, ...)
* [x] put aps on non overlapping wifi channels
* [x] document configuration of `gw-core01`
* [x] provision config of `gw-core01` via ansible (network, firewall, ...)
* [ ] bootstrap an additional prometheus instance on `eae-adp-jump01` that alarms on a missing connection to `gw-core01`
* [ ] move openwrt device to 22.03 - track fw version in ansible ?
* [ ] add wireguard profiles for admins on `eae-adp-jump01`
## Hardware
@ -15,5 +17,16 @@
## Documentation
* [ ] publish `incident 21 - replace gw-core01, reorg cabling`
* [ ] publish `incident 22 - installation of directional LTE antenna`
* [ ] document backbone between `gw-core01` and `eap-adp-jump01`
* [ ] move config/installation stuff into other file (keep OS versions in `README.MD`)
* [x] move config/installation stuff into other file (keep OS versions in `README.MD`)
## Wifi Experience
* [ ] increase airtime by only broadcasting `GU Deutscher Platz Backoffice` in the office containers
* [ ] improve wifi experience for residents
- put at least two aps into every tent
- put the aps into more central locations into the tents
- measure and decrease tx signal power of aps
- maybe replace aps with something more modern (> 2012, > 802.11a/n)