Compare commits

...

4 Commits

Author SHA1 Message Date
Gregor Michels c51e5e438a incidents: add incident 023 about broken public wifi 2022-10-19 02:13:35 +02:00
Gregor Michels ec917a24c6 monitoring: add alarm "PublicWifiUpstreamLost" 2022-10-19 02:05:32 +02:00
Gregor Michels df6a5a93ef monitoring: remove non-dns e2e test 2022-10-19 02:01:08 +02:00
Gregor Michels 466fefeb8d gw-core01: reenable traffic laundering for the public wifi
Fixes: 6297531dfd
2022-10-19 01:47:30 +02:00
4 changed files with 44 additions and 9 deletions

View File

@ -1052,3 +1052,35 @@ The `SMA to TS-9` adapters are too long for the jacks on the gigacube and theref
They could either damage the gigacube, be yanked out or both.
---
023 2022.10.16 ~18:00 - 2022.10.18 ~13:00 | public wifi lost upstream connectivity
----------------------------------------------------------------------------------
**issue**:
The public wifi stopped routing into the internet
**cause**:
The wireguard tunnel towards mullvad stopped handshaking.
It turns out that we forgot the recharge the prepaid account.
**hotfix**:
disable traffic laundering (`6297531`)
**solution**:
recharged mullvad account
**timetable**:
* 2022.10.16 17:50: mullvad vpn stopped handshaking; blackholed public wifi traffic
* 2022.10.18 12:20: notification from facility management that public wifi stopped working
* 2022.10.18 12:50: disabled traffic laundering for public wifi as a hotfix (`6297531`)
* 2022.10.18 21:15: recharged pre-paid mullvad account
* 2022.10.19 01:40: reenabled traffic laundering (`466fefe`)
* 2022.10.19 02:10: added alarming rule for this exact case (`ec917a2`)
**impact**:
* 2022.10.16 ~18:00 - 2022.10.18 ~13:00: public wifi not working
**extended monitoring**:
This is not the first time the public wifi selectivly stopped working because something was wrong with the vpn.
To be proactively notified when this happens again I've created a alarm that should trigger when every end to end test from the public wifi/client network stops working (`ec917a2`).

View File

@ -19,3 +19,12 @@ groups:
annotations:
summary: A node rebooted in the last 2 hours (instance {{ $labels.instance }})
description: "The uptime of a node changed in the last two hours. VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: PublicWifiUpstreamLost
expr: sum(probe_success{job="e2e_clients_v4"}) == 0
for: 0m
labels:
severity: critical
annotations:
summary: The public wifi lost its ability to route into the internet
description: "check the vpn connection"

View File

@ -121,16 +121,16 @@ config rule
option dest '10.84.1.0/24'
option lookup 'main'
option priority 49
option disabled '1'
option disabled '0'
config rule
option in 'clients'
option lookup 'launder'
option priority 50
option disabled '1'
option disabled '0'
config rule
option in 'clients'
option action prohibit
option priority 51
option disabled '1'
option disabled '0'

View File

@ -44,8 +44,6 @@ scrape_configs:
- targets:
- freifunk-leipzig.de
- harald.brainpeach.de
- 195.201.165.118 # freifunk-leipzig.de without dns query
- 88.198.195.242 # harald.brainpeach.de without dns query
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
@ -63,8 +61,6 @@ scrape_configs:
- 192.168.0.1 # gigacube
- freifunk-leipzig.de
- harald.brainpeach.de
- 195.201.165.118 # freifunk-leipzig.de without dns query
- 88.198.195.242 # harald.brainpeach.de without dns query
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
@ -81,8 +77,6 @@ scrape_configs:
- targets:
- freifunk-leipzig.de
- harald.brainpeach.de
- 195.201.165.118 # freifunk-leipzig.de without dns query
- 88.198.195.242 # harald.brainpeach.de without dns query
relabel_configs:
- source_labels: [__address__]
target_label: __param_target