Compare commits

...

4 Commits

Author SHA1 Message Date
Gregor Michels 6bcefd4955 incident 017: add another icmp probe `mon-e2e-wan01` 2022-09-15 02:02:22 +02:00
Gregor Michels e1152c28a0 incidents: add 017 about instability of the network 2022-09-15 01:13:27 +02:00
Gregor Michels 9c5675cbf7 incidents: add 016 power outages on site 2022-09-15 00:48:26 +02:00
Gregor Michels a3a9fdaa74 incident: create incident 015
switched gw-core01 back to the original hardware
2022-09-15 00:18:18 +02:00
5 changed files with 362 additions and 3 deletions

View File

@ -29,6 +29,7 @@ eae-adp-jump01 ip=162.55.53.85 monitoring_ip=10.84.254.0 ansible_python_interpre
[container]
monitoring01 ip=10.84.1.51 cpus=2 disk=50 memory=1024 net='{"net0":"name=eth0,ip=10.84.1.51/24,gw=10.84.1.1,bridge=vmbr0"}'
mon-e2e-clients01 ip=10.84.7.30 cpus=1 disk=10 memory=256 net='{"net0":"name=eth0,ip=dhcp,bridge=vmbr1"}'
mon-e2e-wan01 ip=192.168.0.3 cpus=1 disk=10 memory=256 net='{"net0":"name=eth0,ip=dhcp,bridge=vmbr3"}'
[container:vars]
ostemplate=local:vztmpl/debian-11-standard_11.3-1_amd64.tar.zst

View File

@ -505,8 +505,8 @@ The network cable for `ap-8f39` could be terminated right inside tent 5 because
Because we did not want to crawl behind the seperated rooms inside the tent we decided to route the cable for `ap-8f39` via the outside.
015 2022.09.08 18:45 - ??:?? | gw-core01 unreachable
----------------------------------------------------
015 2022.09.08 18:45 - 2022.09.09 10:15 | gw-core01 unreachable
---------------------------------------------------------------
Since 18:45 `gw-core01` lost its wireguard connection to `eae-adp-jump01`.
Either Vodafone is down or the new router died on us.
@ -533,5 +533,343 @@ wg0: flags=80c3<UP,BROADCAST,RUNNING,NOARP,MULTICAST> mtu 1350
eae-adp-jump01#
```
**diagnosis**:
On the next morning I immediatly went to the facility to diagnose and fix the problem.
Connecting directly into the management port of `gw-core01` did not result in an ssh connection.
The only way I could establish an ssh connection into `gw-core01` was by plugging myself into `sw-access01`.
The device itself responded and seemed to still try to serve dhcp responses. But any kind of routing failed.
`/etc/init.d/network restart` and `ip a` ran into internal timeouts:
```
root@gw-core01:~# date
Fri Sep 9 08:07:33 UTC 2022
root@gw-core01:~# /etc/init.d/network restart
Command failed: Request timed out
Command failed: Request timed out
Command failed: Request timed out
Command failed: Request timed out
^C^CCommand failed: Request timed out
root@gw-core01:~# ip a
^C^C
root@gw-core01:~#
```
There where no kernel messages indicating failures:
```
root@gw-core01:~# dmesg
[ 0.000000] Linux version 5.4.188 (builder@buildhost) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 r16554-1d4dea6d4f)) #0 SMP Sat Apr 16 12:59:34 2022
[ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[ 0.000000] printk: bootconsole [early0] enabled
[ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[ 0.000000] MIPS: machine is Ubiquiti EdgeRouter X SFP
[ 0.000000] Initrd not found or empty - disabling initrd
[ 0.000000] VPE topology {2,2} total 4
[ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] HighMem empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] On node 0 totalpages: 65536
[ 0.000000] Normal zone: 576 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 65536 pages, LIFO batch:15
[ 0.000000] percpu: Embedded 14 pages/cpu s26768 r8192 d22384 u57344
[ 0.000000] pcpu-alloc: s26768 r8192 d22384 u57344 alloc=14*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
[ 0.000000] Kernel command line: console=ttyS0,57600 rootfstype=squashfs,jffs2
[ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[ 0.000000] Writing ErrCtl register=0006c560
[ 0.000000] Readback ErrCtl register=0006c560
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 250792K/262144K available (6097K kernel code, 210K rwdata, 748K rodata, 1252K init, 238K bss, 11352K reserved, 0K cma-reserved, 0K highmem)
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.000000] NR_IRQS: 256
[ 0.000000] random: get_random_bytes called from 0x806e7a3c with crng_init=0
[ 0.000000] CPU Clock: 880MHz
[ 0.000000] clocksource: GIC: mask: 0xffffffffffffffff max_cycles: 0xcaf478abb4, max_idle_ns: 440795247997 ns
[ 0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 4343773742 ns
[ 0.000008] sched_clock: 32 bits at 440MHz, resolution 2ns, wraps every 4880645118ns
[ 0.015500] Calibrating delay loop... 583.68 BogoMIPS (lpj=1167360)
[ 0.055845] pid_max: default: 32768 minimum: 301
[ 0.065197] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.079604] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.097727] rcu: Hierarchical SRCU implementation.
[ 0.107838] smp: Bringing up secondary CPUs ...
[ 6.638149] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 6.638160] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 6.638172] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 6.638275] CPU1 revision is: 0001992f (MIPS 1004Kc)
[ 0.145018] Synchronize counters for CPU 1: done.
[ 6.729209] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 6.729217] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 6.729225] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 6.729283] CPU2 revision is: 0001992f (MIPS 1004Kc)
[ 0.239464] Synchronize counters for CPU 2: done.
[ 6.820315] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 6.820323] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 6.820331] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 6.820392] CPU3 revision is: 0001992f (MIPS 1004Kc)
[ 0.327062] Synchronize counters for CPU 3: done.
[ 0.386672] smp: Brought up 1 node, 4 CPUs
[ 0.399152] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.418438] futex hash table entries: 1024 (order: 3, 32768 bytes, linear)
[ 0.432279] pinctrl core: initialized pinctrl subsystem
[ 0.444212] NET: Registered protocol family 16
[ 0.458990] FPU Affinity set after 4688 emulations
[ 0.477155] clocksource: Switched to clocksource GIC
[ 0.488388] thermal_sys: Registered thermal governor 'step_wise'
[ 0.488922] NET: Registered protocol family 2
[ 0.509581] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linear)
[ 0.525403] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[ 0.542061] TCP established hash table entries: 2048 (order: 1, 8192 bytes, linear)
[ 0.557199] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.571359] TCP: Hash tables configured (established 2048 bind 2048)
[ 0.584096] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 0.596987] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 0.611063] NET: Registered protocol family 1
[ 0.619627] PCI: CLS 0 bytes, default 32
[ 0.717095] 4 CPUs re-calibrate udelay(lpj = 1167360)
[ 0.728609] workingset: timestamp_bits=14 max_order=16 bucket_order=2
[ 0.742148] random: fast init done
[ 0.754345] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 0.765830] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[ 0.786977] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[ 0.803185] GPIO line 487 (sfp_i2c_clk_gate) hogged as output/high
[ 0.815644] mt7621_gpio 1e000600.gpio: registering 32 gpios
[ 0.826917] mt7621_gpio 1e000600.gpio: registering 32 gpios
[ 0.838184] mt7621_gpio 1e000600.gpio: registering 32 gpios
[ 0.850028] Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
[ 0.866274] printk: console [ttyS0] disabled
[ 0.874717] 1e000c00.uartlite: ttyS0 at MMIO 0x1e000c00 (irq = 19, base_baud = 3125000) is a 16550A
[ 0.892649] printk: console [ttyS0] enabled
[ 0.909210] printk: bootconsole [early0] disabled
[ 0.930438] mt7621-nand 1e003000.nand: Using programmed access timing: 31c07388
[ 0.945315] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xda
[ 0.957964] nand: AMD/Spansion S34ML02G2
[ 0.965772] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
[ 0.981019] mt7621-nand 1e003000.nand: ECC strength adjusted to 12 bits
[ 0.994229] mt7621-nand 1e003000.nand: Using programmed access timing: 21005134
[ 1.008785] mt7621-nand 1e003000.nand: Using programmed access timing: 21005134
[ 1.023342] Scanning device for bad blocks
[ 4.991028] 6 fixed-partitions partitions found on MTD device mt7621-nand
[ 5.004548] Creating 6 MTD partitions on "mt7621-nand":
[ 5.014961] 0x000000000000-0x000000080000 : "u-boot"
[ 5.026320] 0x000000080000-0x0000000e0000 : "u-boot-env"
[ 5.038154] 0x0000000e0000-0x000000140000 : "factory"
[ 5.049647] 0x000000140000-0x000000440000 : "kernel1"
[ 5.060986] 0x000000440000-0x000000740000 : "kernel2"
[ 5.072507] 0x000000740000-0x00000ff00000 : "ubi"
[ 5.111797] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[ 5.128943] mtk_soc_eth 1e100000.ethernet dsa: mediatek frame engine at 0xbe100000, irq 20
[ 5.146701] i2c-mt7621 1e000900.i2c: clock 100 kHz
[ 5.160084] NET: Registered protocol family 10
[ 5.170415] Segment Routing with IPv6
[ 5.177872] NET: Registered protocol family 17
[ 5.186801] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 5.212902] 8021q: 802.1Q VLAN Support v1.8
[ 5.223128] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[ 5.246580] mt7530 mdio-bus:1f eth0 (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY]
[ 5.264582] mt7530 mdio-bus:1f eth1 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY]
[ 5.282533] mt7530 mdio-bus:1f eth2 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY]
[ 5.300513] mt7530 mdio-bus:1f eth3 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY]
[ 5.318518] mt7530 mdio-bus:1f eth4 (uninitialized): PHY [dsa-0.0:04] driver [Generic PHY]
[ 5.336462] mt7530 mdio-bus:1f eth5 (uninitialized): PHY [mdio-bus:07] driver [Atheros 8031 ethernet]
[ 5.356180] mt7530 mdio-bus:1f: configuring for fixed/rgmii link mode
[ 5.373826] DSA: tree 0 setup
[ 5.380199] mt7530 mdio-bus:1f: Link is Up - 1Gbps/Full - flow control off
[ 5.381345] UBI: auto-attach mtd5
[ 5.400526] ubi0: attaching mtd5
[ 7.967018] ubi0: scanning is finished
[ 7.993716] ubi0: attached mtd5 (name "ubi", size 247 MiB)
[ 8.004666] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 8.018357] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[ 8.031871] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 8.045746] ubi0: good PEBs: 1982, bad PEBs: 0, corrupted PEBs: 0
[ 8.057881] ubi0: user volume: 2, internal volumes: 1, max. volumes count: 128
[ 8.072269] ubi0: max/mean erase counter: 2/0, WL threshold: 4096, image sequence number: 231599107
[ 8.090289] ubi0: available PEBs: 0, total reserved PEBs: 1982, PEBs reserved for bad PEB handling: 40
[ 8.108849] ubi0: background thread "ubi_bgt0d" started, PID 480
[ 8.111258] block ubiblock0_0: created from ubi0:0(rootfs)
[ 8.131782] ubiblock: device ubiblock0_0 (rootfs) set to be root filesystem
[ 8.145658] hctosys: unable to open rtc device (rtc0)
[ 8.163435] VFS: Mounted root (squashfs filesystem) readonly on device 254:0.
[ 8.181987] Freeing unused kernel memory: 1252K
[ 8.191032] This architecture does not have kernel memory protection.
[ 8.203856] Run /sbin/init as init process
[ 8.682869] init: Console is alive
[ 8.689902] init: - watchdog -
[ 8.904196] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[ 9.001394] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[ 9.021446] init: - preinit -
[ 9.693086] mtk_soc_eth 1e100000.ethernet dsa: configuring for fixed/rgmii link mode
[ 9.709069] mtk_soc_eth 1e100000.ethernet dsa: Link is Up - 1Gbps/Full - flow control rx/tx
[ 9.725759] IPv6: ADDRCONF(NETDEV_CHANGE): dsa: link becomes ready
[ 9.883838] random: jshn: uninitialized urandom read (4 bytes read)
[ 9.954270] random: jshn: uninitialized urandom read (4 bytes read)
[ 9.999580] random: jshn: uninitialized urandom read (4 bytes read)
[ 10.308223] device dsa entered promiscuous mode
[ 10.317845] mt7530 mdio-bus:1f eth1: configuring for phy/gmii link mode
[ 10.331434] 8021q: adding VLAN 0 to HW filter on device eth1
[ 14.562377] UBIFS (ubi0:1): Mounting in unauthenticated mode
[ 14.574081] UBIFS (ubi0:1): background thread "ubifs_bgt0_1" started, PID 587
[ 14.616572] urandom_read: 6 callbacks suppressed
[ 14.616584] random: procd: uninitialized urandom read (4 bytes read)
[ 14.658895] UBIFS (ubi0:1): recovery needed
[ 14.833294] UBIFS (ubi0:1): recovery completed
[ 14.842330] UBIFS (ubi0:1): UBIFS: mounted UBI device 0, volume 1, name "rootfs_data"
[ 14.857938] UBIFS (ubi0:1): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 14.877699] UBIFS (ubi0:1): FS size: 236429312 bytes (225 MiB, 1862 LEBs), journal size 11808768 bytes (11 MiB, 93 LEBs)
[ 14.899365] UBIFS (ubi0:1): reserved for root: 4952683 bytes (4836 KiB)
[ 14.912550] UBIFS (ubi0:1): media format: w5/r0 (latest is w5/r0), UUID 787684E8-A245-4FE7-9437-3D9F0B3BD798, small LPT model
[ 14.941199] mount_root: switching to ubifs overlay
[ 14.969476] urandom-seed: Seeding with /etc/urandom.seed
[ 15.073478] device dsa left promiscuous mode
[ 15.091161] procd: - early -
[ 15.097018] procd: - watchdog -
[ 15.649297] procd: - watchdog -
[ 15.659598] procd: - ubus -
[ 15.720914] procd: - init -
[ 16.294778] kmodloader: loading kernel modules from /etc/modules.d/*
[ 16.320888] i2c /dev entries driver
[ 16.330637] pca953x 0-0025: 0-0025 supply vcc not found, using dummy regulator
[ 16.345230] pca953x 0-0025: using no AI
[ 16.421511] pca953x 0-0025: interrupt support not compiled in
[ 16.465255] sfp sfp_eth5: Host maximum power 1.0W
[ 16.480258] urngd: v1.0.2 started.
[ 16.501156] sfp sfp_eth5: No tx_disable pin: SFP modules will always be emitting.
[ 16.524577] xt_time: kernel timezone is -0000
[ 16.549198] PPP generic driver version 2.4.2
[ 16.559155] NET: Registered protocol family 24
[ 16.571409] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[ 16.587065] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[ 16.616800] kmodloader: done loading kernel modules from /etc/modules.d/*
[ 16.627960] crng init done
[ 21.588816] mtk_soc_eth 1e100000.ethernet dsa: Link is Down
[ 21.608948] mtk_soc_eth 1e100000.ethernet dsa: configuring for fixed/rgmii link mode
[ 21.624843] mtk_soc_eth 1e100000.ethernet dsa: Link is Up - 1Gbps/Full - flow control rx/tx
[ 21.630145] mt7530 mdio-bus:1f eth0: configuring for phy/gmii link mode
[ 21.655323] 8021q: adding VLAN 0 to HW filter on device eth0
[ 21.669925] IPv6: ADDRCONF(NETDEV_CHANGE): dsa: link becomes ready
[ 21.683342] switch: port 1(eth0) entered blocking state
[ 21.693825] switch: port 1(eth0) entered disabled state
[ 21.705836] device eth0 entered promiscuous mode
[ 21.715085] device dsa entered promiscuous mode
[ 21.798255] mt7530 mdio-bus:1f eth1: configuring for phy/gmii link mode
[ 21.812164] 8021q: adding VLAN 0 to HW filter on device eth1
[ 21.827686] switch: port 2(eth1) entered blocking state
[ 21.838224] switch: port 2(eth1) entered disabled state
[ 21.850382] device eth1 entered promiscuous mode
[ 21.876743] mt7530 mdio-bus:1f eth2: configuring for phy/gmii link mode
[ 21.890510] 8021q: adding VLAN 0 to HW filter on device eth2
[ 21.905871] switch: port 3(eth2) entered blocking state
[ 21.916366] switch: port 3(eth2) entered disabled state
[ 21.928552] device eth2 entered promiscuous mode
[ 21.959473] mt7530 mdio-bus:1f eth3: configuring for phy/gmii link mode
[ 21.973384] 8021q: adding VLAN 0 to HW filter on device eth3
[ 21.988082] switch: port 4(eth3) entered blocking state
[ 21.998586] switch: port 4(eth3) entered disabled state
[ 22.010746] device eth3 entered promiscuous mode
[ 22.041595] mt7530 mdio-bus:1f eth4: configuring for phy/gmii link mode
[ 22.055465] 8021q: adding VLAN 0 to HW filter on device eth4
[ 22.070479] switch: port 5(eth4) entered blocking state
[ 22.080938] switch: port 5(eth4) entered disabled state
[ 22.093369] device eth4 entered promiscuous mode
[ 44.201668] mt7530 mdio-bus:1f eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 44.216647] switch: port 1(eth0) entered blocking state
[ 44.227108] switch: port 1(eth0) entered forwarding state
[ 44.238996] IPv6: ADDRCONF(NETDEV_CHANGE): switch: link becomes ready
[ 44.252780] IPv6: ADDRCONF(NETDEV_CHANGE): switch.1: link becomes ready
[ 44.266682] IPv6: ADDRCONF(NETDEV_CHANGE): switch.2: link becomes ready
[ 44.280599] IPv6: ADDRCONF(NETDEV_CHANGE): switch.3: link becomes ready
[ 44.294436] IPv6: ADDRCONF(NETDEV_CHANGE): switch.8: link becomes ready
[ 53.674078] mt7530 mdio-bus:1f eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[ 53.689068] switch: port 3(eth2) entered blocking state
[ 53.699526] switch: port 3(eth2) entered forwarding state
[ 55.786325] mt7530 mdio-bus:1f eth3: Link is Up - 1Gbps/Full - flow control rx/tx
[ 55.801371] switch: port 4(eth3) entered blocking state
[ 55.811823] switch: port 4(eth3) entered forwarding state
[ 59.946845] mt7530 mdio-bus:1f eth4: Link is Up - 1Gbps/Full - flow control rx/tx
[ 59.961897] switch: port 5(eth4) entered blocking state
[ 59.972332] switch: port 5(eth4) entered forwarding state
```
`logread` was full of messages from `dnsmasq-dhcp` which I am not going to share publicly.
**issue**:
Unknown. Probably faulty hardware or a bug in OpenWrt
**solution**:
Because the "original" `gw-core01` (see incident `012` for details) was way stabler I replaced `gw-core01` again with the old node
**impact**:
no routing into the internet
no routing, dhcp and dns in the specified timeframe
016 2022.09.11 21:39, etc. | power outages on site
--------------------------------------------------
There where power outages in the facility.
**outages**:
* 09.11 21:39: `office` and `tent 5`
* 09.12 17:47: `office`
* 09.13 01:47: `office`
* `tent 5`: `sw-acces02`, `ap-2bbf`, `ap-1a38`, `ap-8f39`
* `office`: all equipment except `tent 5`
**impact**:
* service interruption in the mentioned timeframes till power was restored and equipment back online
017 2022.09.13 | wifi instabilites reported by the facility management
----------------------------------------------------------------------
the facility management reached out and reported that
* they are having issues with the wifi on there new notebooks
* sometimes the wifi is marked as having `no internet` by windows
* they need to switch to the public wifi instead of using the backoffice wifi
* according to the security the wifi is unuseable at around 12:00
**thoughts**:
To help diagnose the problem I expanded the monitoring with end to end monitors.
They are pinging two sites in the internet.
One is using the normal wan (like the backoffice) and one is using the vpn (like the public wifi).
See commit `f011562` for details.
I am noticing that sometimes the icmp probes fail, which is odd.
But at the same time the node exporter on `eae-adp-jump01` is reachable.
I implemented another icmp probe that directly uses the `gigacube` without traversing `gw-core01` before.
While checking the gigacube for the size of the dhcp pool I have noticed that `Signal Strength` is only `-91 dBm`.
That seems way to little. Maybe this is the cause for the wan instability ?
**changes in reponse**:
* set password for `gigacube-2001` (see `pass`)
* the gui force me to set a password
* reused the password from the old gigacube that was set by the facility management
* will write the changed pw onto the current gigacube next time i'll visit
* probably going to create a new secure password beforehand :)
* configures to static ip bindings on the gigacube
* for `gw-core01`: `192.168.0.2`
* for `mon-e2e-wan01`: `192.168.0.3`
* added `WAN/gigacube` vlan onto proxmox port via `gw-core01`
* added `vmbr3` on proxmox which is used for the `WAN/gigacube` vlan
* created `mon-e2e-wan01` that lives inside the gigacube network and probes the same stuff as `mon-e2e-clients01`

Binary file not shown.

View File

@ -22,6 +22,7 @@
- name: provision blackbox exporters
hosts:
- mon-e2e-clients01
- mon-e2e-wan01
- monitoring01
tasks:
- name: install blackbox exporter

View File

@ -33,6 +33,7 @@ scrape_configs:
static_configs:
- targets:
- {{ hostvars['mon-e2e-clients01']['ip'] }}:9115
- {{ hostvars['mon-e2e-wan01']['ip'] }}:9115
- {{ hostvars['monitoring01']['ip'] }}:9115
- job_name: 'e2e_clients_v4'
@ -71,3 +72,21 @@ scrape_configs:
target_label: instance
- target_label: __address__
replacement: {{ hostvars['monitoring01']['ip'] }}:9115
- job_name: 'e2e_wan_v4'
metrics_path: /probe
params:
module: [icmp_v4]
static_configs:
- targets:
- freifunk-leipzig.de
- harald.brainpeach.de
- 195.201.165.118 # freifunk-leipzig.de without dns query
- 88.198.195.242 # harald.brainpeach.de without dns query
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: {{ hostvars['mon-e2e-wan01']['ip'] }}:9115