Compare commits
3 Commits
f6ba9f5aa6
...
03e2543f95
Author | SHA1 | Date |
---|---|---|
Gregor Michels | 03e2543f95 | |
Gregor Michels | 0475923590 | |
Gregor Michels | 69834a8d2b |
|
@ -1118,3 +1118,99 @@ all updates where doing using the new "idempotent" `playbook_sysupgrade` (since
|
||||||
* 2022.10.24 01:44 - 01:46: `gw-core01`
|
* 2022.10.24 01:44 - 01:46: `gw-core01`
|
||||||
=> downtime of the accesspoints in the specified timeframe
|
=> downtime of the accesspoints in the specified timeframe
|
||||||
=> downtime of `gw-core01` in the specified timeframe
|
=> downtime of `gw-core01` in the specified timeframe
|
||||||
|
|
||||||
|
|
||||||
|
025 2022.11.19 04:00 (ANS) | (maintenance) (try to) steer clients into 5 GHz band
|
||||||
|
---------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_this log entry was added way after doing the acutal work.
|
||||||
|
Please read it with a grain of salt_
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**problem**:
|
||||||
|
* (if i remember correctly) way more clients in the 2,4 GhZ band than in the 5 GHz band (3/4 to 1/4)
|
||||||
|
|
||||||
|
**solution**:
|
||||||
|
* halfe the transmit power in the 2,4 GHz band
|
||||||
|
* increased transmit power in the 5 GHz band by 1 dBm
|
||||||
|
* implemented by `5017cb5`
|
||||||
|
|
||||||
|
**impact**:
|
||||||
|
This restarted wifi on all APs at the same time.
|
||||||
|
Downtime for all clients for a few seconds at 04:00 in the morning.
|
||||||
|
|
||||||
|
**validation**:
|
||||||
|
One day afterwards it seemed like there where more clients in the 5 GHz band (50/50), but the datarates dropped for most of them.
|
||||||
|
|
||||||
|
**critisism**:
|
||||||
|
* placement, transmit power and supported bands of the clients impact 5 GHz utilization
|
||||||
|
* unsure what actually is the problem
|
||||||
|
* also did not correctly validate for a few days
|
||||||
|
|
||||||
|
|
||||||
|
026 2022.11.20 15:30 (ANS) | (maintenance) replace SFP modules
|
||||||
|
--------------------------------------------------------------
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
|
||||||
|
027 2022.11.21 02:00 | (maintenance) attach volume to `eae-adp-jump01` for prometheus
|
||||||
|
-------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
20 GB volume an vm attached
|
||||||
|
|
||||||
|
reboot: gegen kurz vor 02:00
|
||||||
|
vorher noch syspatch
|
||||||
|
|
||||||
|
```
|
||||||
|
eae-adp-jump01# rcctl stop prometheus
|
||||||
|
eae-adp-jump01# rm -r /var/prometheus/*
|
||||||
|
eae-adp-jump01# sysctl hw.disknames
|
||||||
|
eae-adp-jump01# fdisk -iy sd1
|
||||||
|
eae-adp-jump01# disklabel -E sd1
|
||||||
|
> a a
|
||||||
|
>
|
||||||
|
> *
|
||||||
|
> q
|
||||||
|
eae-adp-jump01# newfs sd1a
|
||||||
|
eae-adp-jump01# diff -Naur /etc/fstab.20221121 /etc/fstab
|
||||||
|
--- /etc/fstab.20221121 Sun Jun 26 23:00:39 2022
|
||||||
|
+++ /etc/fstab Mon Nov 21 02:01:03 2022
|
||||||
|
@@ -8,3 +8,4 @@
|
||||||
|
e1c3571d54635852.j /usr/obj ffs rw,nodev,nosuid 1 2
|
||||||
|
e1c3571d54635852.i /usr/src ffs rw,nodev,nosuid 1 2
|
||||||
|
e1c3571d54635852.e /var ffs rw,nodev,nosuid 1 2
|
||||||
|
+a0469c9f38992e1d.a /var/prometheus ffs rw,nodev,nosuid 1 2
|
||||||
|
eae-adp-jump01# mount /var/prometheus
|
||||||
|
eae-adp-jump01# chown _prometheus:_prometheus /var/prometheus
|
||||||
|
eae-adp-jump01# rcctl start prometheus
|
||||||
|
```
|
||||||
|
|
||||||
|
028 2022.11.29 02:00 | periodically restart prometheus
|
||||||
|
-------------
|
||||||
|
|
||||||
|
028 2022.11.29 03:00 | (maintenance) activate auto start for offloader
|
||||||
|
-------------
|
||||||
|
|
||||||
|
offloader down from 02:50 to 03:05
|
||||||
|
gw-core down from 02:53 to 02:55
|
||||||
|
|
||||||
|
|
||||||
|
029 2022.11.30 15:30 | (maintenance) replace switches
|
||||||
|
----
|
||||||
|
|
||||||
|
* 16:34:30 - 16:34:50: `ffl-ans-sw-distribution01`
|
||||||
|
* quickly replaced device and connections
|
||||||
|
* 16:49: `ffl-ans-sw-access01`: minimal L2 downtme, accesspoint needed a reboot
|
||||||
|
* power up new device alongside
|
||||||
|
* bridge old and new device with short patch cable
|
||||||
|
* move sfp uplink to new device
|
||||||
|
* move first ap to new switch
|
||||||
|
* wait till ap was back up and serving clients
|
||||||
|
* move second ap
|
||||||
|
* teardown old device
|
||||||
|
* 17:09:30 - 17:10:15`: `ffl-ans-sw-access02`
|
||||||
|
* quickly replaced device and connections
|
||||||
|
|
|
@ -4,7 +4,7 @@ groups:
|
||||||
# from https://awesome-prometheus-alerts.grep.to/rules.html#rule-prometheus-self-monitoring-1-2
|
# from https://awesome-prometheus-alerts.grep.to/rules.html#rule-prometheus-self-monitoring-1-2
|
||||||
- alert: PrometheusTargetMissing
|
- alert: PrometheusTargetMissing
|
||||||
expr: up == 0
|
expr: up == 0
|
||||||
for: 0m
|
for: 1m
|
||||||
labels:
|
labels:
|
||||||
severity: critical
|
severity: critical
|
||||||
annotations:
|
annotations:
|
||||||
|
@ -64,3 +64,12 @@ groups:
|
||||||
annotations:
|
annotations:
|
||||||
summary: A switch port changed it's state {{ $value }}x time
|
summary: A switch port changed it's state {{ $value }}x time
|
||||||
description: "For some reason a switch port changed it's state\n LABELS = {{ $labels }}"
|
description: "For some reason a switch port changed it's state\n LABELS = {{ $labels }}"
|
||||||
|
|
||||||
|
- alert: SNMPNodeRebooted
|
||||||
|
expr: (sysUpTime / 100) <= (60 * 60 * 2)
|
||||||
|
for: 0m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: A snmp node rebooted in the last 2 hours (instance {{ $labels.instance }})
|
||||||
|
description: "The uptime of a snmp node changed in the last two hours. VALUE = {{ $value }}\n LABELS = {{ $labels }}"
|
||||||
|
|
Reference in New Issue