From ec0cfc908ab88fa790d42cd993453f6c76bc8461 Mon Sep 17 00:00:00 2001 From: Gregor Michels Date: Fri, 23 Dec 2022 01:39:26 +0100 Subject: [PATCH] add incident 029: ans create a service for the offloader vm --- documentation/INCIDENTS.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/documentation/INCIDENTS.md b/documentation/INCIDENTS.md index d23f07c..c8f5e28 100644 --- a/documentation/INCIDENTS.md +++ b/documentation/INCIDENTS.md @@ -1235,3 +1235,31 @@ eae-adp-jump01# crontab -e [...] 0 */2 * * * rcctl restart prometheus ``` + + +029 2022.11.29 03:00 (ANS) | (maintenance) automagically start offloader +------------------------------------------------------------------------ + +--- + +_this log entry was added way after doing the actual work. +Please read it with a grain of salt_ + +--- + +**problem**: +ANS washes the traffic via a FFLPZ/FFDD offloader vm. +There only was a script that manually started the offloader vm. +On reboots the offloader vm would not automagically start. + +**solution**: +implement a service that starts the vm + +**impact**: +after validating the script on another openwrt machine I tested the script in production. +This created the following downtimes: +* `offloader` down from 02:50 to 03:05 -- service interruption for the public wifi +* `ffl-ans-gw-core01` down from 02:53 to 02:55 -- service interruption for everybody + +**disclaimer**: +The script is manually deployed on `ffl-ans-gw-core01` and therefore not part of this repo at the moment