|
|
|
@ -1235,3 +1235,67 @@ eae-adp-jump01# crontab -e
|
|
|
|
|
[...]
|
|
|
|
|
0 */2 * * * rcctl restart prometheus
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
029 2022.11.29 03:00 (ANS) | (maintenance) automagically start offloader
|
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
_this log entry was added way after doing the actual work.
|
|
|
|
|
Please read it with a grain of salt_
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
**problem**:
|
|
|
|
|
ANS washes the traffic via a FFLPZ/FFDD offloader vm.
|
|
|
|
|
There only was a script that manually started the offloader vm.
|
|
|
|
|
On reboots the offloader vm would not automagically start.
|
|
|
|
|
|
|
|
|
|
**solution**:
|
|
|
|
|
implement a service that starts the vm
|
|
|
|
|
|
|
|
|
|
**impact**:
|
|
|
|
|
after validating the script on another openwrt machine I tested the script in production.
|
|
|
|
|
This created the following downtimes:
|
|
|
|
|
* `offloader` down from 02:50 to 03:05 -- service interruption for the public wifi
|
|
|
|
|
* `ffl-ans-gw-core01` down from 02:53 to 02:55 -- service interruption for everybody
|
|
|
|
|
|
|
|
|
|
**disclaimer**:
|
|
|
|
|
The script is manually deployed on `ffl-ans-gw-core01` and therefore not part of this repo at the moment
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
030 2022.11.30 15:30 (ANS) | (maintenance) replace switches
|
|
|
|
|
-----------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
_this log entry was added way after doing the actual work.
|
|
|
|
|
Please read it with a grain of salt_
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
**intro**:
|
|
|
|
|
The switches installed into ans were defective.
|
|
|
|
|
Not every boot had working PoE.
|
|
|
|
|
Meaning that a power outage could result in no power for the APs.
|
|
|
|
|
Fortunately `Zyxel` replaced the devices.
|
|
|
|
|
|
|
|
|
|
**replacement log**:
|
|
|
|
|
* 16:34:30 - 16:34:50: `ffl-ans-sw-distribution01`
|
|
|
|
|
* quickly replaced device and connections
|
|
|
|
|
* => l2 interruption for `ffl-ans-sw-acces01` and `ffl-ans-sw-access02`
|
|
|
|
|
* => power cycle of APs in social, security and facility container
|
|
|
|
|
* 16:49: `ffl-ans-sw-access01`
|
|
|
|
|
* power up new device alongside
|
|
|
|
|
* bridge old and new device with short patch cable
|
|
|
|
|
* move sfp uplink to new device
|
|
|
|
|
* move first ap to new switch
|
|
|
|
|
* wait till ap was back up and serving clients
|
|
|
|
|
* move second ap
|
|
|
|
|
* teardown old device
|
|
|
|
|
* => minimal l2 downtime
|
|
|
|
|
* => rolling AP downtimes
|
|
|
|
|
* 17:09:30 - 17:10:15`: `ffl-ans-sw-access02`
|
|
|
|
|
* quickly replaced device and connections
|
|
|
|
|
* => power cycle of all APs in `tent 2&3`
|
|
|
|
|