incident: create incident 015

switched gw-core01 back to the original hardware
This commit is contained in:
Gregor Michels 2022-09-15 00:18:18 +02:00
parent 01c3d3f300
commit a3a9fdaa74
1 changed files with 288 additions and 3 deletions

View File

@ -505,8 +505,8 @@ The network cable for `ap-8f39` could be terminated right inside tent 5 because
Because we did not want to crawl behind the seperated rooms inside the tent we decided to route the cable for `ap-8f39` via the outside.
015 2022.09.08 18:45 - ??:?? | gw-core01 unreachable
----------------------------------------------------
015 2022.09.08 18:45 - 2022.09.09 10:15 | gw-core01 unreachable
---------------------------------------------------------------
Since 18:45 `gw-core01` lost its wireguard connection to `eae-adp-jump01`.
Either Vodafone is down or the new router died on us.
@ -533,5 +533,290 @@ wg0: flags=80c3<UP,BROADCAST,RUNNING,NOARP,MULTICAST> mtu 1350
eae-adp-jump01#
```
**diagnosis**:
On the next morning I immediatly went to the facility to diagnose and fix the problem.
Connecting directly into the management port of `gw-core01` did not result in an ssh connection.
The only way I could establish an ssh connection into `gw-core01` was by plugging myself into `sw-access01`.
The device itself responded and seemed to still try to serve dhcp responses. But any kind of routing failed.
`/etc/init.d/network restart` and `ip a` ran into internal timeouts:
```
root@gw-core01:~# date
Fri Sep 9 08:07:33 UTC 2022
root@gw-core01:~# /etc/init.d/network restart
Command failed: Request timed out
Command failed: Request timed out
Command failed: Request timed out
Command failed: Request timed out
^C^CCommand failed: Request timed out
root@gw-core01:~# ip a
^C^C
root@gw-core01:~#
```
There where no kernel messages indicating failures:
```
root@gw-core01:~# dmesg
[ 0.000000] Linux version 5.4.188 (builder@buildhost) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 r16554-1d4dea6d4f)) #0 SMP Sat Apr 16 12:59:34 2022
[ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[ 0.000000] printk: bootconsole [early0] enabled
[ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[ 0.000000] MIPS: machine is Ubiquiti EdgeRouter X SFP
[ 0.000000] Initrd not found or empty - disabling initrd
[ 0.000000] VPE topology {2,2} total 4
[ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] HighMem empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
[ 0.000000] On node 0 totalpages: 65536
[ 0.000000] Normal zone: 576 pages used for memmap
[ 0.000000] Normal zone: 0 pages reserved
[ 0.000000] Normal zone: 65536 pages, LIFO batch:15
[ 0.000000] percpu: Embedded 14 pages/cpu s26768 r8192 d22384 u57344
[ 0.000000] pcpu-alloc: s26768 r8192 d22384 u57344 alloc=14*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
[ 0.000000] Kernel command line: console=ttyS0,57600 rootfstype=squashfs,jffs2
[ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[ 0.000000] Writing ErrCtl register=0006c560
[ 0.000000] Readback ErrCtl register=0006c560
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 250792K/262144K available (6097K kernel code, 210K rwdata, 748K rodata, 1252K init, 238K bss, 11352K reserved, 0K cma-reserved, 0K highmem)
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.000000] NR_IRQS: 256
[ 0.000000] random: get_random_bytes called from 0x806e7a3c with crng_init=0
[ 0.000000] CPU Clock: 880MHz
[ 0.000000] clocksource: GIC: mask: 0xffffffffffffffff max_cycles: 0xcaf478abb4, max_idle_ns: 440795247997 ns
[ 0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 4343773742 ns
[ 0.000008] sched_clock: 32 bits at 440MHz, resolution 2ns, wraps every 4880645118ns
[ 0.015500] Calibrating delay loop... 583.68 BogoMIPS (lpj=1167360)
[ 0.055845] pid_max: default: 32768 minimum: 301
[ 0.065197] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.079604] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.097727] rcu: Hierarchical SRCU implementation.
[ 0.107838] smp: Bringing up secondary CPUs ...
[ 6.638149] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 6.638160] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 6.638172] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 6.638275] CPU1 revision is: 0001992f (MIPS 1004Kc)
[ 0.145018] Synchronize counters for CPU 1: done.
[ 6.729209] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 6.729217] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 6.729225] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 6.729283] CPU2 revision is: 0001992f (MIPS 1004Kc)
[ 0.239464] Synchronize counters for CPU 2: done.
[ 6.820315] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[ 6.820323] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[ 6.820331] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[ 6.820392] CPU3 revision is: 0001992f (MIPS 1004Kc)
[ 0.327062] Synchronize counters for CPU 3: done.
[ 0.386672] smp: Brought up 1 node, 4 CPUs
[ 0.399152] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.418438] futex hash table entries: 1024 (order: 3, 32768 bytes, linear)
[ 0.432279] pinctrl core: initialized pinctrl subsystem
[ 0.444212] NET: Registered protocol family 16
[ 0.458990] FPU Affinity set after 4688 emulations
[ 0.477155] clocksource: Switched to clocksource GIC
[ 0.488388] thermal_sys: Registered thermal governor 'step_wise'
[ 0.488922] NET: Registered protocol family 2
[ 0.509581] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linear)
[ 0.525403] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[ 0.542061] TCP established hash table entries: 2048 (order: 1, 8192 bytes, linear)
[ 0.557199] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear)
[ 0.571359] TCP: Hash tables configured (established 2048 bind 2048)
[ 0.584096] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 0.596987] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 0.611063] NET: Registered protocol family 1
[ 0.619627] PCI: CLS 0 bytes, default 32
[ 0.717095] 4 CPUs re-calibrate udelay(lpj = 1167360)
[ 0.728609] workingset: timestamp_bits=14 max_order=16 bucket_order=2
[ 0.742148] random: fast init done
[ 0.754345] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 0.765830] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[ 0.786977] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[ 0.803185] GPIO line 487 (sfp_i2c_clk_gate) hogged as output/high
[ 0.815644] mt7621_gpio 1e000600.gpio: registering 32 gpios
[ 0.826917] mt7621_gpio 1e000600.gpio: registering 32 gpios
[ 0.838184] mt7621_gpio 1e000600.gpio: registering 32 gpios
[ 0.850028] Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
[ 0.866274] printk: console [ttyS0] disabled
[ 0.874717] 1e000c00.uartlite: ttyS0 at MMIO 0x1e000c00 (irq = 19, base_baud = 3125000) is a 16550A
[ 0.892649] printk: console [ttyS0] enabled
[ 0.909210] printk: bootconsole [early0] disabled
[ 0.930438] mt7621-nand 1e003000.nand: Using programmed access timing: 31c07388
[ 0.945315] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xda
[ 0.957964] nand: AMD/Spansion S34ML02G2
[ 0.965772] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
[ 0.981019] mt7621-nand 1e003000.nand: ECC strength adjusted to 12 bits
[ 0.994229] mt7621-nand 1e003000.nand: Using programmed access timing: 21005134
[ 1.008785] mt7621-nand 1e003000.nand: Using programmed access timing: 21005134
[ 1.023342] Scanning device for bad blocks
[ 4.991028] 6 fixed-partitions partitions found on MTD device mt7621-nand
[ 5.004548] Creating 6 MTD partitions on "mt7621-nand":
[ 5.014961] 0x000000000000-0x000000080000 : "u-boot"
[ 5.026320] 0x000000080000-0x0000000e0000 : "u-boot-env"
[ 5.038154] 0x0000000e0000-0x000000140000 : "factory"
[ 5.049647] 0x000000140000-0x000000440000 : "kernel1"
[ 5.060986] 0x000000440000-0x000000740000 : "kernel2"
[ 5.072507] 0x000000740000-0x00000ff00000 : "ubi"
[ 5.111797] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[ 5.128943] mtk_soc_eth 1e100000.ethernet dsa: mediatek frame engine at 0xbe100000, irq 20
[ 5.146701] i2c-mt7621 1e000900.i2c: clock 100 kHz
[ 5.160084] NET: Registered protocol family 10
[ 5.170415] Segment Routing with IPv6
[ 5.177872] NET: Registered protocol family 17
[ 5.186801] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 5.212902] 8021q: 802.1Q VLAN Support v1.8
[ 5.223128] mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
[ 5.246580] mt7530 mdio-bus:1f eth0 (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY]
[ 5.264582] mt7530 mdio-bus:1f eth1 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY]
[ 5.282533] mt7530 mdio-bus:1f eth2 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY]
[ 5.300513] mt7530 mdio-bus:1f eth3 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY]
[ 5.318518] mt7530 mdio-bus:1f eth4 (uninitialized): PHY [dsa-0.0:04] driver [Generic PHY]
[ 5.336462] mt7530 mdio-bus:1f eth5 (uninitialized): PHY [mdio-bus:07] driver [Atheros 8031 ethernet]
[ 5.356180] mt7530 mdio-bus:1f: configuring for fixed/rgmii link mode
[ 5.373826] DSA: tree 0 setup
[ 5.380199] mt7530 mdio-bus:1f: Link is Up - 1Gbps/Full - flow control off
[ 5.381345] UBI: auto-attach mtd5
[ 5.400526] ubi0: attaching mtd5
[ 7.967018] ubi0: scanning is finished
[ 7.993716] ubi0: attached mtd5 (name "ubi", size 247 MiB)
[ 8.004666] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[ 8.018357] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[ 8.031871] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[ 8.045746] ubi0: good PEBs: 1982, bad PEBs: 0, corrupted PEBs: 0
[ 8.057881] ubi0: user volume: 2, internal volumes: 1, max. volumes count: 128
[ 8.072269] ubi0: max/mean erase counter: 2/0, WL threshold: 4096, image sequence number: 231599107
[ 8.090289] ubi0: available PEBs: 0, total reserved PEBs: 1982, PEBs reserved for bad PEB handling: 40
[ 8.108849] ubi0: background thread "ubi_bgt0d" started, PID 480
[ 8.111258] block ubiblock0_0: created from ubi0:0(rootfs)
[ 8.131782] ubiblock: device ubiblock0_0 (rootfs) set to be root filesystem
[ 8.145658] hctosys: unable to open rtc device (rtc0)
[ 8.163435] VFS: Mounted root (squashfs filesystem) readonly on device 254:0.
[ 8.181987] Freeing unused kernel memory: 1252K
[ 8.191032] This architecture does not have kernel memory protection.
[ 8.203856] Run /sbin/init as init process
[ 8.682869] init: Console is alive
[ 8.689902] init: - watchdog -
[ 8.904196] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[ 9.001394] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[ 9.021446] init: - preinit -
[ 9.693086] mtk_soc_eth 1e100000.ethernet dsa: configuring for fixed/rgmii link mode
[ 9.709069] mtk_soc_eth 1e100000.ethernet dsa: Link is Up - 1Gbps/Full - flow control rx/tx
[ 9.725759] IPv6: ADDRCONF(NETDEV_CHANGE): dsa: link becomes ready
[ 9.883838] random: jshn: uninitialized urandom read (4 bytes read)
[ 9.954270] random: jshn: uninitialized urandom read (4 bytes read)
[ 9.999580] random: jshn: uninitialized urandom read (4 bytes read)
[ 10.308223] device dsa entered promiscuous mode
[ 10.317845] mt7530 mdio-bus:1f eth1: configuring for phy/gmii link mode
[ 10.331434] 8021q: adding VLAN 0 to HW filter on device eth1
[ 14.562377] UBIFS (ubi0:1): Mounting in unauthenticated mode
[ 14.574081] UBIFS (ubi0:1): background thread "ubifs_bgt0_1" started, PID 587
[ 14.616572] urandom_read: 6 callbacks suppressed
[ 14.616584] random: procd: uninitialized urandom read (4 bytes read)
[ 14.658895] UBIFS (ubi0:1): recovery needed
[ 14.833294] UBIFS (ubi0:1): recovery completed
[ 14.842330] UBIFS (ubi0:1): UBIFS: mounted UBI device 0, volume 1, name "rootfs_data"
[ 14.857938] UBIFS (ubi0:1): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[ 14.877699] UBIFS (ubi0:1): FS size: 236429312 bytes (225 MiB, 1862 LEBs), journal size 11808768 bytes (11 MiB, 93 LEBs)
[ 14.899365] UBIFS (ubi0:1): reserved for root: 4952683 bytes (4836 KiB)
[ 14.912550] UBIFS (ubi0:1): media format: w5/r0 (latest is w5/r0), UUID 787684E8-A245-4FE7-9437-3D9F0B3BD798, small LPT model
[ 14.941199] mount_root: switching to ubifs overlay
[ 14.969476] urandom-seed: Seeding with /etc/urandom.seed
[ 15.073478] device dsa left promiscuous mode
[ 15.091161] procd: - early -
[ 15.097018] procd: - watchdog -
[ 15.649297] procd: - watchdog -
[ 15.659598] procd: - ubus -
[ 15.720914] procd: - init -
[ 16.294778] kmodloader: loading kernel modules from /etc/modules.d/*
[ 16.320888] i2c /dev entries driver
[ 16.330637] pca953x 0-0025: 0-0025 supply vcc not found, using dummy regulator
[ 16.345230] pca953x 0-0025: using no AI
[ 16.421511] pca953x 0-0025: interrupt support not compiled in
[ 16.465255] sfp sfp_eth5: Host maximum power 1.0W
[ 16.480258] urngd: v1.0.2 started.
[ 16.501156] sfp sfp_eth5: No tx_disable pin: SFP modules will always be emitting.
[ 16.524577] xt_time: kernel timezone is -0000
[ 16.549198] PPP generic driver version 2.4.2
[ 16.559155] NET: Registered protocol family 24
[ 16.571409] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[ 16.587065] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[ 16.616800] kmodloader: done loading kernel modules from /etc/modules.d/*
[ 16.627960] crng init done
[ 21.588816] mtk_soc_eth 1e100000.ethernet dsa: Link is Down
[ 21.608948] mtk_soc_eth 1e100000.ethernet dsa: configuring for fixed/rgmii link mode
[ 21.624843] mtk_soc_eth 1e100000.ethernet dsa: Link is Up - 1Gbps/Full - flow control rx/tx
[ 21.630145] mt7530 mdio-bus:1f eth0: configuring for phy/gmii link mode
[ 21.655323] 8021q: adding VLAN 0 to HW filter on device eth0
[ 21.669925] IPv6: ADDRCONF(NETDEV_CHANGE): dsa: link becomes ready
[ 21.683342] switch: port 1(eth0) entered blocking state
[ 21.693825] switch: port 1(eth0) entered disabled state
[ 21.705836] device eth0 entered promiscuous mode
[ 21.715085] device dsa entered promiscuous mode
[ 21.798255] mt7530 mdio-bus:1f eth1: configuring for phy/gmii link mode
[ 21.812164] 8021q: adding VLAN 0 to HW filter on device eth1
[ 21.827686] switch: port 2(eth1) entered blocking state
[ 21.838224] switch: port 2(eth1) entered disabled state
[ 21.850382] device eth1 entered promiscuous mode
[ 21.876743] mt7530 mdio-bus:1f eth2: configuring for phy/gmii link mode
[ 21.890510] 8021q: adding VLAN 0 to HW filter on device eth2
[ 21.905871] switch: port 3(eth2) entered blocking state
[ 21.916366] switch: port 3(eth2) entered disabled state
[ 21.928552] device eth2 entered promiscuous mode
[ 21.959473] mt7530 mdio-bus:1f eth3: configuring for phy/gmii link mode
[ 21.973384] 8021q: adding VLAN 0 to HW filter on device eth3
[ 21.988082] switch: port 4(eth3) entered blocking state
[ 21.998586] switch: port 4(eth3) entered disabled state
[ 22.010746] device eth3 entered promiscuous mode
[ 22.041595] mt7530 mdio-bus:1f eth4: configuring for phy/gmii link mode
[ 22.055465] 8021q: adding VLAN 0 to HW filter on device eth4
[ 22.070479] switch: port 5(eth4) entered blocking state
[ 22.080938] switch: port 5(eth4) entered disabled state
[ 22.093369] device eth4 entered promiscuous mode
[ 44.201668] mt7530 mdio-bus:1f eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 44.216647] switch: port 1(eth0) entered blocking state
[ 44.227108] switch: port 1(eth0) entered forwarding state
[ 44.238996] IPv6: ADDRCONF(NETDEV_CHANGE): switch: link becomes ready
[ 44.252780] IPv6: ADDRCONF(NETDEV_CHANGE): switch.1: link becomes ready
[ 44.266682] IPv6: ADDRCONF(NETDEV_CHANGE): switch.2: link becomes ready
[ 44.280599] IPv6: ADDRCONF(NETDEV_CHANGE): switch.3: link becomes ready
[ 44.294436] IPv6: ADDRCONF(NETDEV_CHANGE): switch.8: link becomes ready
[ 53.674078] mt7530 mdio-bus:1f eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[ 53.689068] switch: port 3(eth2) entered blocking state
[ 53.699526] switch: port 3(eth2) entered forwarding state
[ 55.786325] mt7530 mdio-bus:1f eth3: Link is Up - 1Gbps/Full - flow control rx/tx
[ 55.801371] switch: port 4(eth3) entered blocking state
[ 55.811823] switch: port 4(eth3) entered forwarding state
[ 59.946845] mt7530 mdio-bus:1f eth4: Link is Up - 1Gbps/Full - flow control rx/tx
[ 59.961897] switch: port 5(eth4) entered blocking state
[ 59.972332] switch: port 5(eth4) entered forwarding state
```
`logread` was full of messages from `dnsmasq-dhcp` which I am not going to share publicly.
**issue**:
Unknown. Probably faulty hardware or a bug in OpenWrt
**solution**:
Because the "original" `gw-core01` (see incident `012` for details) was way stabler I replaced `gw-core01` again with the old node
**impact**:
no routing into the internet
no routing, dhcp and dns in the specified timeframe