incidents: add incident 023 about broken public wifi

This commit is contained in:
Gregor Michels 2022-10-19 02:13:35 +02:00
parent ec917a24c6
commit c51e5e438a
1 changed files with 32 additions and 0 deletions

View File

@ -1052,3 +1052,35 @@ The `SMA to TS-9` adapters are too long for the jacks on the gigacube and theref
They could either damage the gigacube, be yanked out or both.
---
023 2022.10.16 ~18:00 - 2022.10.18 ~13:00 | public wifi lost upstream connectivity
----------------------------------------------------------------------------------
**issue**:
The public wifi stopped routing into the internet
**cause**:
The wireguard tunnel towards mullvad stopped handshaking.
It turns out that we forgot the recharge the prepaid account.
**hotfix**:
disable traffic laundering (`6297531`)
**solution**:
recharged mullvad account
**timetable**:
* 2022.10.16 17:50: mullvad vpn stopped handshaking; blackholed public wifi traffic
* 2022.10.18 12:20: notification from facility management that public wifi stopped working
* 2022.10.18 12:50: disabled traffic laundering for public wifi as a hotfix (`6297531`)
* 2022.10.18 21:15: recharged pre-paid mullvad account
* 2022.10.19 01:40: reenabled traffic laundering (`466fefe`)
* 2022.10.19 02:10: added alarming rule for this exact case (`ec917a2`)
**impact**:
* 2022.10.16 ~18:00 - 2022.10.18 ~13:00: public wifi not working
**extended monitoring**:
This is not the first time the public wifi selectivly stopped working because something was wrong with the vpn.
To be proactively notified when this happens again I've created a alarm that should trigger when every end to end test from the public wifi/client network stops working (`ec917a2`).