From 870ec2ec936f23076ab9f35ae9dcdf44fe26b703 Mon Sep 17 00:00:00 2001 From: Pingfan Liu Date: Thu, 5 Jan 2023 10:17:50 +0800 Subject: [PATCH] ppc64: tackle SRCU hang issue Resolves: bz2158296 Upstream: RHEL-only On PowerPC platform, the following hang is witnessed: Welcome to Red Hat Enterprise Linux 9.2 Beta (Plow) dracut-057-13.git20220816.el9 (Initramfs) ! [ 1.631210] systemd[1]: Hostname set to . [-- MARK -- Mon Sep 26 01:45:00 2022] [ 243.681283] INFO: task systemd:1 blocked for more than 122 seconds. [ 243.681303] Not tainted 5.14.0-167.el9.ppc64le #1 [ 243.681315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.681329] task:systemd state:D stack: 0 pid: 1 ppid: 0 flags:0x00042000 [ 243.681349] Call Trace: [ 243.681356] [c00000001a603640] [c00000004f990100] 0xc00000004f990100 (unreliable) [ 243.681378] [c00000001a603830] [c00000001001e9cc] __switch_to+0x12c/0x220 [ 243.681400] [c00000001a603890] [c000000010ec5b40] __schedule+0x230/0x720 [ 243.681418] [c00000001a603950] [c000000010ec6090] schedule+0x60/0x110 [ 243.681435] [c00000001a603980] [c000000010ecd948] schedule_timeout+0x168/0x1c0 [ 243.681454] [c00000001a603a60] [c000000010ec7214] __wait_for_common+0x134/0x360 [ 243.681473] [c00000001a603b00] [c00000001017c98c] __flush_work.isra.0+0x1dc/0x3d0 [ 243.681493] [c00000001a603ba0] [c0000000105cbd88] fsnotify_wait_marks_destroyed+0x28/0x40 [ 243.681512] [c00000001a603bc0] [c0000000105cb800] fsnotify_destroy_group+0x60/0x150 [ 243.681531] [c00000001a603c30] [c0000000105cf640] inotify_release+0x30/0xa0 [ 243.681548] [c00000001a603ca0] [c00000001054fad8] __fput+0xc8/0x350 [ 243.681565] [c00000001a603cf0] [c000000010183174] task_work_run+0xe4/0x160 [ 243.681583] [c00000001a603d40] [c000000010021874] do_notify_resume+0x134/0x140 [ 243.681602] [c00000001a603d70] [c000000010030168] interrupt_exit_user_prepare_main+0x198/0x270 [ 243.681622] [c00000001a603de0] [c0000000100305ac] syscall_exit_prepare+0x6c/0x180 [ 243.681641] [c00000001a603e10] [c00000001000bff4] system_call_vectored_common+0xf4/0x278 [ 243.681661] --- interrupt: 3000 at 0x7fffb3015ba4 [ 243.681673] NIP: 00007fffb3015ba4 LR: 0000000000000000 CTR: 0000000000000000 [ 243.681687] REGS: c00000001a603e80 TRAP: 3000 Not tainted (5.14.0-167.el9.ppc64le) [ 243.681703] MSR: 800000000000d033 CR: 42044440 XER: 00000000 [ 243.681737] IRQMASK: 0 [ 243.681737] GPR00: 0000000000000006 00007fffd24a31a0 00007fffb3127200 0000000000000000 [ 243.681737] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000000 [ 243.681737] GPR08: 0000010009ea2d40 0000000000000000 0000000000000000 0000000000000000 [ 243.681737] GPR12: 0000000000000000 00007fffb3834bc0 0000000000000000 0000000000000000 [ 243.681737] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 243.681737] GPR20: 000000012c74ddf0 000000000000000e 000000000017cd3f 0000000000000000 [ 243.681737] GPR24: 00007fffd24a3570 0000000000000005 0000010009eb5490 0000010009ea24e0 [ 243.681737] GPR28: 0000010009ea2900 0000010009eb4850 0000010009ea2d70 00007fffb382dd98 [ 243.681896] NIP [00007fffb3015ba4] 0x7fffb3015ba4 [ 243.681907] LR [0000000000000000] 0x0 [ 243.681917] --- interrupt: 3000 [ 243.681928] INFO: task kworker/u16:1:34 blocked for more than 122 seconds. [ 243.681941] Not tainted 5.14.0-167.el9.ppc64le #1 [ 243.681951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.681964] task:kworker/u16:1 state:D stack: 0 pid: 34 ppid: 2 flags:0x00000800 [ 243.681982] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 243.681998] Call Trace: [ 243.682005] [c00000001a9336d0] [c00000004f990100] 0xc00000004f990100 (unreliable) [ 243.682023] [c00000001a9338c0] [c00000001001e9cc] __switch_to+0x12c/0x220 [ 243.682042] [c00000001a933920] [c000000010ec5b40] __schedule+0x230/0x720 [ 243.682059] [c00000001a9339e0] [c000000010ec6090] schedule+0x60/0x110 [ 243.682075] [c00000001a933a10] [c000000010ecd948] schedule_timeout+0x168/0x1c0 [ 243.682094] [c00000001a933af0] [c000000010ec7214] __wait_for_common+0x134/0x360 [ 243.682113] [c00000001a933b90] [c000000010213370] __synchronize_srcu.part.0+0xa0/0xe0 [ 243.682132] [c00000001a933c00] [c0000000105cc154] fsnotify_mark_destroy_workfn+0xc4/0x1a0 [ 243.682151] [c00000001a933c70] [c00000001017acb8] process_one_work+0x298/0x580 [ 243.682169] [c00000001a933d10] [c00000001017b048] worker_thread+0xa8/0x630 [ 243.682185] [c00000001a933da0] [c000000010188348] kthread+0x1b8/0x1c0 [ 243.682203] [c00000001a933e10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64 [ 366.561279] INFO: task systemd:1 blocked for more than 245 seconds. The right solution should be in kernel, but since the patch [1] for SRCU will not be merged into the mainline in near future, it had better to have a userspace workaround to overcome this test blocker. The workaround method is to pass the kernel parameter "srcutree.big_cpu_lim=0", so that the SRCU system will always use srcu_node array. [1]: https://lore.kernel.org/rcu/20221026032716.78674-1-kernelfans@gmail.com/T/#m6534975507c2abca497a94d81c7abbfea1d0978d Signed-off-by: Pingfan Liu --- kdump.sysconfig.ppc64 | 2 +- kdump.sysconfig.ppc64le | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kdump.sysconfig.ppc64 b/kdump.sysconfig.ppc64 index 1b0cdc7..7d9df72 100644 --- a/kdump.sysconfig.ppc64 +++ b/kdump.sysconfig.ppc64 @@ -21,7 +21,7 @@ KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swio # This variable lets us append arguments to the current kdump commandline # after processed by KDUMP_COMMANDLINE_REMOVE -KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0" +KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0 srcutree.big_cpu_lim=0" # Any additional kexec arguments required. In most situations, this should # be left empty diff --git a/kdump.sysconfig.ppc64le b/kdump.sysconfig.ppc64le index d951def..789661f 100644 --- a/kdump.sysconfig.ppc64le +++ b/kdump.sysconfig.ppc64le @@ -21,7 +21,7 @@ KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swio # This variable lets us append arguments to the current kdump commandline # after processed by KDUMP_COMMANDLINE_REMOVE -KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0" +KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd hugetlb_cma=0 srcutree.big_cpu_lim=0" # Any additional kexec arguments required. In most situations, this should # be left empty