| CVE |
Vendors |
Products |
Updated |
CVSS v2 |
CVSS v3 |
In the Linux kernel, the following vulnerability has been resolved:
batman-adv: bypass empty buckets in batadv_purge_orig_ref()
Many syzbot reports are pointing to soft lockups in
batadv_purge_orig_ref() [1]
Root cause is unknown, but we can avoid spending too much
time there and perhaps get more interesting reports.
[1]
watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:6:621]
Modules linked in:
irq event stamp: 6182794
hardirqs last enabled at (6182793): [<ffff8000801dae10>] ...
In the Linux kernel, the following vulnerability has been resolved:
batman-adv: bypass empty buckets in batadv_purge_orig_ref()
Many syzbot reports are pointing to soft lockups in
batadv_purge_orig_ref() [1]
Root cause is unknown, but we can avoid spending too much
time there and perhaps get more interesting reports.
[1]
watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:6:621]
Modules linked in:
irq event stamp: 6182794
hardirqs last enabled at (6182793): [<ffff8000801dae10>] __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386
hardirqs last disabled at (6182794): [<ffff80008ad66a78>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline]
hardirqs last disabled at (6182794): [<ffff80008ad66a78>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551
softirqs last enabled at (6182792): [<ffff80008aab71c4>] spin_unlock_bh include/linux/spinlock.h:396 [inline]
softirqs last enabled at (6182792): [<ffff80008aab71c4>] batadv_purge_orig_ref+0x114c/0x1228 net/batman-adv/originator.c:1287
softirqs last disabled at (6182790): [<ffff80008aab61dc>] spin_lock_bh include/linux/spinlock.h:356 [inline]
softirqs last disabled at (6182790): [<ffff80008aab61dc>] batadv_purge_orig_ref+0x164/0x1228 net/batman-adv/originator.c:1271
CPU: 0 PID: 621 Comm: kworker/u4:6 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Workqueue: bat_events batadv_purge_orig
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : should_resched arch/arm64/include/asm/preempt.h:79 [inline]
pc : __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:388
lr : __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386
sp : ffff800099007970
x29: ffff800099007980 x28: 1fffe00018fce1bd x27: dfff800000000000
x26: ffff0000d2620008 x25: ffff0000c7e70de8 x24: 0000000000000001
x23: 1fffe00018e57781 x22: dfff800000000000 x21: ffff80008aab71c4
x20: ffff0001b40136c0 x19: ffff0000c72bbc08 x18: 1fffe0001a817bb0
x17: ffff800125414000 x16: ffff80008032116c x15: 0000000000000001
x14: 1fffe0001ee9d610 x13: 0000000000000000 x12: 0000000000000003
x11: 0000000000000000 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : 00000000005e5789 x7 : ffff80008aab61dc x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000006 x1 : 0000000000000080 x0 : ffff800125414000
Call trace:
__daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline]
arch_local_irq_enable arch/arm64/include/asm/irqflags.h:49 [inline]
__local_bh_enable_ip+0x228/0x44c kernel/softirq.c:386
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock_bh+0x3c/0x4c kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:396 [inline]
batadv_purge_orig_ref+0x114c/0x1228 net/batman-adv/originator.c:1287
batadv_purge_orig+0x20/0x70 net/batman-adv/originator.c:1300
process_one_work+0x694/0x1204 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x938/0xef4 kernel/workqueue.c:2787
kthread+0x288/0x310 kernel/kthread.c:388
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:51
lr : default_idle_call+0xf8/0x128 kernel/sched/idle.c:103
sp : ffff800093a17d30
x29: ffff800093a17d30 x28: dfff800000000000 x27: 1ffff00012742fb4
x26: ffff80008ec9d000 x25: 0000000000000000 x24: 0000000000000002
x23: 1ffff00011d93a74 x22: ffff80008ec9d3a0 x21: 0000000000000000
x20: ffff0000c19dbc00 x19: ffff8000802d0fd8 x18: 1fffe00036804396
x17: ffff80008ec9d000 x16: ffff8000802d089c x15: 0000000000000001
---truncated---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
drop_monitor: replace spin_lock by raw_spin_lock
trace_drop_common() is called with preemption disabled, and it acquires
a spin_lock. This is problematic for RT kernels because spin_locks are
sleeping locks in this configuration, which causes the following splat:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47
preem ...
In the Linux kernel, the following vulnerability has been resolved:
drop_monitor: replace spin_lock by raw_spin_lock
trace_drop_common() is called with preemption disabled, and it acquires
a spin_lock. This is problematic for RT kernels because spin_locks are
sleeping locks in this configuration, which causes the following splat:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47
preempt_count: 1, expected: 0
RCU nest depth: 2, expected: 2
5 locks held by rcuc/47/449:
#0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210
#1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130
#2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210
#3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70
#4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290
irq event stamp: 139909
hardirqs last enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80
hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290
softirqs last enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170
softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0
Preemption disabled at:
[<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0
CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7
Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022
Call Trace:
<TASK>
dump_stack_lvl+0x8c/0xd0
dump_stack+0x14/0x20
__might_resched+0x21e/0x2f0
rt_spin_lock+0x5e/0x130
? trace_drop_common.constprop.0+0xb5/0x290
? skb_queue_purge_reason.part.0+0x1bf/0x230
trace_drop_common.constprop.0+0xb5/0x290
? preempt_count_sub+0x1c/0xd0
? _raw_spin_unlock_irqrestore+0x4a/0x80
? __pfx_trace_drop_common.constprop.0+0x10/0x10
? rt_mutex_slowunlock+0x26a/0x2e0
? skb_queue_purge_reason.part.0+0x1bf/0x230
? __pfx_rt_mutex_slowunlock+0x10/0x10
? skb_queue_purge_reason.part.0+0x1bf/0x230
trace_kfree_skb_hit+0x15/0x20
trace_kfree_skb+0xe9/0x150
kfree_skb_reason+0x7b/0x110
skb_queue_purge_reason.part.0+0x1bf/0x230
? __pfx_skb_queue_purge_reason.part.0+0x10/0x10
? mark_lock.part.0+0x8a/0x520
...
trace_drop_common() also disables interrupts, but this is a minor issue
because we could easily replace it with a local_lock.
Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic
context.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: mt7921s: fix potential hung tasks during chip recovery
During chip recovery (e.g. chip reset), there is a possible situation that
kernel worker reset_work is holding the lock and waiting for kernel thread
stat_worker to be parked, while stat_worker is waiting for the release of
the same lock.
It causes a deadlock resulting in the dumping of hung tasks messages and
possible rebooting of the device.
This patch preve ...
In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: mt7921s: fix potential hung tasks during chip recovery
During chip recovery (e.g. chip reset), there is a possible situation that
kernel worker reset_work is holding the lock and waiting for kernel thread
stat_worker to be parked, while stat_worker is waiting for the release of
the same lock.
It causes a deadlock resulting in the dumping of hung tasks messages and
possible rebooting of the device.
This patch prevents the execution of stat_worker during the chip recovery.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
ext4: do not create EA inode under buffer lock
ext4_xattr_set_entry() creates new EA inodes while holding buffer lock
on the external xattr block. This is problematic as it nests all the
allocation locking (which acquires locks on other buffers) under the
buffer lock. This can even deadlock when the filesystem is corrupted and
e.g. quota file is setup to contain xattr block as data block. Move the
allocation of EA inode out of ...
In the Linux kernel, the following vulnerability has been resolved:
ext4: do not create EA inode under buffer lock
ext4_xattr_set_entry() creates new EA inodes while holding buffer lock
on the external xattr block. This is problematic as it nests all the
allocation locking (which acquires locks on other buffers) under the
buffer lock. This can even deadlock when the filesystem is corrupted and
e.g. quota file is setup to contain xattr block as data block. Move the
allocation of EA inode out of ext4_xattr_set_entry() into the callers.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
serial: imx: Introduce timeout when waiting on transmitter empty
By waiting at most 1 second for USR2_TXDC to be set, we avoid a potential
deadlock.
In case of the timeout, there is not much we can do, so we simply ignore
the transmitter state and optimistically try to continue.
|
In the Linux kernel, the following vulnerability has been resolved:
riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.
This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be calle ...
In the Linux kernel, the following vulnerability has been resolved:
riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.
This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c
Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
wifi: mac80211: Fix deadlock in ieee80211_sta_ps_deliver_wakeup()
The ieee80211_sta_ps_deliver_wakeup() function takes sta->ps_lock to
synchronizes with ieee80211_tx_h_unicast_ps_buf() which is called from
softirq context. However using only spin_lock() to get sta->ps_lock in
ieee80211_sta_ps_deliver_wakeup() does not prevent softirq to execute
on this same CPU, to run ieee80211_tx_h_unicast_ps_buf() and try to
take this same ...
In the Linux kernel, the following vulnerability has been resolved:
wifi: mac80211: Fix deadlock in ieee80211_sta_ps_deliver_wakeup()
The ieee80211_sta_ps_deliver_wakeup() function takes sta->ps_lock to
synchronizes with ieee80211_tx_h_unicast_ps_buf() which is called from
softirq context. However using only spin_lock() to get sta->ps_lock in
ieee80211_sta_ps_deliver_wakeup() does not prevent softirq to execute
on this same CPU, to run ieee80211_tx_h_unicast_ps_buf() and try to
take this same lock ending in deadlock. Below is an example of rcu stall
that arises in such situation.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 2-....: (42413413 ticks this GP) idle=b154/1/0x4000000000000000 softirq=1763/1765 fqs=21206996
rcu: (t=42586894 jiffies g=2057 q=362405 ncpus=4)
CPU: 2 PID: 719 Comm: wpa_supplicant Tainted: G W 6.4.0-02158-g1b062f552873 #742
Hardware name: RPT (r1) (DT)
pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : queued_spin_lock_slowpath+0x58/0x2d0
lr : invoke_tx_handlers_early+0x5b4/0x5c0
sp : ffff00001ef64660
x29: ffff00001ef64660 x28: ffff000009bc1070 x27: ffff000009bc0ad8
x26: ffff000009bc0900 x25: ffff00001ef647a8 x24: 0000000000000000
x23: ffff000009bc0900 x22: ffff000009bc0900 x21: ffff00000ac0e000
x20: ffff00000a279e00 x19: ffff00001ef646e8 x18: 0000000000000000
x17: ffff800016468000 x16: ffff00001ef608c0 x15: 0010533c93f64f80
x14: 0010395c9faa3946 x13: 0000000000000000 x12: 00000000fa83b2da
x11: 000000012edeceea x10: ffff0000010fbe00 x9 : 0000000000895440
x8 : 000000000010533c x7 : ffff00000ad8b740 x6 : ffff00000c350880
x5 : 0000000000000007 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffff00000ac0e0e8
Call trace:
queued_spin_lock_slowpath+0x58/0x2d0
ieee80211_tx+0x80/0x12c
ieee80211_tx_pending+0x110/0x278
tasklet_action_common.constprop.0+0x10c/0x144
tasklet_action+0x20/0x28
_stext+0x11c/0x284
____do_softirq+0xc/0x14
call_on_irq_stack+0x24/0x34
do_softirq_own_stack+0x18/0x20
do_softirq+0x74/0x7c
__local_bh_enable_ip+0xa0/0xa4
_ieee80211_wake_txqs+0x3b0/0x4b8
__ieee80211_wake_queue+0x12c/0x168
ieee80211_add_pending_skbs+0xec/0x138
ieee80211_sta_ps_deliver_wakeup+0x2a4/0x480
ieee80211_mps_sta_status_update.part.0+0xd8/0x11c
ieee80211_mps_sta_status_update+0x18/0x24
sta_apply_parameters+0x3bc/0x4c0
ieee80211_change_station+0x1b8/0x2dc
nl80211_set_station+0x444/0x49c
genl_family_rcv_msg_doit.isra.0+0xa4/0xfc
genl_rcv_msg+0x1b0/0x244
netlink_rcv_skb+0x38/0x10c
genl_rcv+0x34/0x48
netlink_unicast+0x254/0x2bc
netlink_sendmsg+0x190/0x3b4
____sys_sendmsg+0x1e8/0x218
___sys_sendmsg+0x68/0x8c
__sys_sendmsg+0x44/0x84
__arm64_sys_sendmsg+0x20/0x28
do_el0_svc+0x6c/0xe8
el0_svc+0x14/0x48
el0t_64_sync_handler+0xb0/0xb4
el0t_64_sync+0x14c/0x150
Using spin_lock_bh()/spin_unlock_bh() instead prevents softirq to raise
on the same CPU that is holding the lock.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
net: fec: remove .ndo_poll_controller to avoid deadlocks
There is a deadlock issue found in sungem driver, please refer to the
commit ac0a230f719b ("eth: sungem: remove .ndo_poll_controller to avoid
deadlocks"). The root cause of the issue is that netpoll is in atomic
context and disable_irq() is called by .ndo_poll_controller interface
of sungem driver, however, disable_irq() might sleep. After analyzing
the implementation of ...
In the Linux kernel, the following vulnerability has been resolved:
net: fec: remove .ndo_poll_controller to avoid deadlocks
There is a deadlock issue found in sungem driver, please refer to the
commit ac0a230f719b ("eth: sungem: remove .ndo_poll_controller to avoid
deadlocks"). The root cause of the issue is that netpoll is in atomic
context and disable_irq() is called by .ndo_poll_controller interface
of sungem driver, however, disable_irq() might sleep. After analyzing
the implementation of fec_poll_controller(), the fec driver should have
the same issue. Due to the fec driver uses NAPI for TX completions, the
.ndo_poll_controller is unnecessary to be implemented in the fec driver,
so fec_poll_controller() can be safely removed.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
media: usbtv: Remove useless locks in usbtv_video_free()
Remove locks calls in usbtv_video_free() because
are useless and may led to a deadlock as reported here:
https://syzkaller.appspot.com/x/bisect.txt?x=166dc872180000
Also remove usbtv_stop() call since it will be called when
unregistering the device.
Before 'c838530d230b' this issue would only be noticed if you
disconnect while streaming and now it is noticeable even whe ...
In the Linux kernel, the following vulnerability has been resolved:
media: usbtv: Remove useless locks in usbtv_video_free()
Remove locks calls in usbtv_video_free() because
are useless and may led to a deadlock as reported here:
https://syzkaller.appspot.com/x/bisect.txt?x=166dc872180000
Also remove usbtv_stop() call since it will be called when
unregistering the device.
Before 'c838530d230b' this issue would only be noticed if you
disconnect while streaming and now it is noticeable even when
disconnecting while not streaming.
[hverkuil: fix minor spelling mistake in log message]
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
tty: xilinx_uartps: split sysrq handling
lockdep detects the following circular locking dependency:
CPU 0 CPU 1
========================== ============================
cdns_uart_isr() printk()
uart_port_lock(port) console_lock()
cdns_uart_console_write()
if (!port->sysrq)
uart_port_lock(port)
uart_handle_break()
...
In the Linux kernel, the following vulnerability has been resolved:
tty: xilinx_uartps: split sysrq handling
lockdep detects the following circular locking dependency:
CPU 0 CPU 1
========================== ============================
cdns_uart_isr() printk()
uart_port_lock(port) console_lock()
cdns_uart_console_write()
if (!port->sysrq)
uart_port_lock(port)
uart_handle_break()
port->sysrq = ...
uart_handle_sysrq_char()
printk()
console_lock()
The fixed commit attempts to avoid this situation by only taking the
port lock in cdns_uart_console_write if port->sysrq unset. However, if
(as shown above) cdns_uart_console_write runs before port->sysrq is set,
then it will try to take the port lock anyway. This may result in a
deadlock.
Fix this by splitting sysrq handling into two parts. We use the prepare
helper under the port lock and defer handling until we release the lock.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()
The following call-chain leads to enabling interrupts in a nested interrupt
disabled section:
irq_set_vcpu_affinity()
irq_get_desc_lock()
raw_spin_lock_irqsave() <--- Disable interrupts
its_irq_set_vcpu_affinity()
guard(raw_spinlock_irq) <--- Enables interrupts when leaving the guard()
irq_put_desc_unlock() <--- Warns because ...
In the Linux kernel, the following vulnerability has been resolved:
irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()
The following call-chain leads to enabling interrupts in a nested interrupt
disabled section:
irq_set_vcpu_affinity()
irq_get_desc_lock()
raw_spin_lock_irqsave() <--- Disable interrupts
its_irq_set_vcpu_affinity()
guard(raw_spinlock_irq) <--- Enables interrupts when leaving the guard()
irq_put_desc_unlock() <--- Warns because interrupts are enabled
This was broken in commit b97e8a2f7130, which replaced the original
raw_spin_[un]lock() pair with guard(raw_spinlock_irq).
Fix the issue by using guard(raw_spinlock).
[ tglx: Massaged change log ]
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
virtio-blk: don't keep queue frozen during system suspend
Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before
deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
PM callbacks. And the motivation is to drain inflight IOs before suspending.
block layer's queue freeze looks very handy, but it is also easy to cause
deadlock, such as, any attempt to call into bio_queue_enter() may run int ...
In the Linux kernel, the following vulnerability has been resolved:
virtio-blk: don't keep queue frozen during system suspend
Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before
deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
PM callbacks. And the motivation is to drain inflight IOs before suspending.
block layer's queue freeze looks very handy, but it is also easy to cause
deadlock, such as, any attempt to call into bio_queue_enter() may run into
deadlock if the queue is frozen in current context. There are all kinds
of ->suspend() called in suspend context, so keeping queue frozen in the
whole suspend context isn't one good idea. And Marek reported lockdep
warning[1] caused by virtio-blk's freeze queue in virtblk_freeze().
[1] https://lore.kernel.org/linux-block/[email protected]/
Given the motivation is to drain in-flight IOs, it can be done by calling
freeze & unfreeze, meantime restore to previous behavior by keeping queue
quiesced during suspend.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
net: restrict SO_REUSEPORT to inet sockets
After blamed commit, crypto sockets could accidentally be destroyed
from RCU call back, as spotted by zyzbot [1].
Trying to acquire a mutex in RCU callback is not allowed.
Restrict SO_REUSEPORT socket option to inet sockets.
v1 of this patch supported TCP, UDP and SCTP sockets,
but fcnal-test.sh test needed RAW and ICMP support.
[1]
BUG: sleeping function called from invalid conte ...
In the Linux kernel, the following vulnerability has been resolved:
net: restrict SO_REUSEPORT to inet sockets
After blamed commit, crypto sockets could accidentally be destroyed
from RCU call back, as spotted by zyzbot [1].
Trying to acquire a mutex in RCU callback is not allowed.
Restrict SO_REUSEPORT socket option to inet sockets.
v1 of this patch supported TCP, UDP and SCTP sockets,
but fcnal-test.sh test needed RAW and ICMP support.
[1]
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 24, name: ksoftirqd/1
preempt_count: 100, expected: 0
RCU nest depth: 0, expected: 0
1 lock held by ksoftirqd/1/24:
#0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2561 [inline]
#0: ffffffff8e937ba0 (rcu_callback){....}-{0:0}, at: rcu_core+0xa37/0x17a0 kernel/rcu/tree.c:2823
Preemption disabled at:
[<ffffffff8161c8c8>] softirq_handle_begin kernel/softirq.c:402 [inline]
[<ffffffff8161c8c8>] handle_softirqs+0x128/0x9b0 kernel/softirq.c:537
CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.13.0-rc3-syzkaller-00174-ga024e377efed #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
__might_resched+0x5d4/0x780 kernel/sched/core.c:8758
__mutex_lock_common kernel/locking/mutex.c:562 [inline]
__mutex_lock+0x131/0xee0 kernel/locking/mutex.c:735
crypto_put_default_null_skcipher+0x18/0x70 crypto/crypto_null.c:179
aead_release+0x3d/0x50 crypto/algif_aead.c:489
alg_do_release crypto/af_alg.c:118 [inline]
alg_sock_destruct+0x86/0xc0 crypto/af_alg.c:502
__sk_destruct+0x58/0x5f0 net/core/sock.c:2260
rcu_do_batch kernel/rcu/tree.c:2567 [inline]
rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
handle_softirqs+0x2d4/0x9b0 kernel/softirq.c:561
run_ksoftirqd+0xca/0x130 kernel/softirq.c:950
smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking
If a device uses MCP23xxx IO expander to receive IRQs, the following
bug can happen:
BUG: sleeping function called from invalid context
at kernel/locking/mutex.c:283
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, ...
preempt_count: 1, expected: 0
...
Call Trace:
...
__might_resched+0x104/0x10e
__might_sleep+0x3e/0x62
mutex_lock+0x ...
In the Linux kernel, the following vulnerability has been resolved:
pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking
If a device uses MCP23xxx IO expander to receive IRQs, the following
bug can happen:
BUG: sleeping function called from invalid context
at kernel/locking/mutex.c:283
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, ...
preempt_count: 1, expected: 0
...
Call Trace:
...
__might_resched+0x104/0x10e
__might_sleep+0x3e/0x62
mutex_lock+0x20/0x4c
regmap_lock_mutex+0x10/0x18
regmap_update_bits_base+0x2c/0x66
mcp23s08_irq_set_type+0x1ae/0x1d6
__irq_set_trigger+0x56/0x172
__setup_irq+0x1e6/0x646
request_threaded_irq+0xb6/0x160
...
We observed the problem while experimenting with a touchscreen driver which
used MCP23017 IO expander (I2C).
The regmap in the pinctrl-mcp23s08 driver uses a mutex for protection from
concurrent accesses, which is the default for regmaps without .fast_io,
.disable_locking, etc.
mcp23s08_irq_set_type() calls regmap_update_bits_base(), and the latter
locks the mutex.
However, __setup_irq() locks desc->lock spinlock before calling these
functions. As a result, the system tries to lock the mutex whole holding
the spinlock.
It seems, the internal regmap locks are not needed in this driver at all.
mcp->lock seems to protect the regmap from concurrent accesses already,
except, probably, in mcp_pinconf_get/set.
mcp23s08_irq_set_type() and mcp23s08_irq_mask/unmask() are called under
chip_bus_lock(), which calls mcp23s08_irq_bus_lock(). The latter takes
mcp->lock and enables regmap caching, so that the potentially slow I2C
accesses are deferred until chip_bus_unlock().
The accesses to the regmap from mcp23s08_probe_one() do not need additional
locking.
In all remaining places where the regmap is accessed, except
mcp_pinconf_get/set(), the driver already takes mcp->lock.
This patch adds locking in mcp_pinconf_get/set() and disables internal
locking in the regmap config. Among other things, it fixes the sleeping
in atomic context described above.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
scsi: megaraid_sas: Fix for a potential deadlock
This fixes a 'possible circular locking dependency detected' warning
CPU0 CPU1
---- ----
lock(&instance->reset_mutex);
lock(&shost->scan_mutex);
lock(&instance->reset_mutex);
lock(&shost->scan_mutex);
Fix this by temporarily releasing the reset_mutex.
|
In the Linux kernel, the following vulnerability has been resolved:
bpf: fix recursive lock when verdict program return SK_PASS
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.
'''
sk_psock_strp_data_ready
write_lock_bh(&sk->sk_callback_lock)
strp_data_ready
strp_read_sock
read_sock -> tcp_read_soc ...
In the Linux kernel, the following vulnerability has been resolved:
bpf: fix recursive lock when verdict program return SK_PASS
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.
'''
sk_psock_strp_data_ready
write_lock_bh(&sk->sk_callback_lock)
strp_data_ready
strp_read_sock
read_sock -> tcp_read_sock
strp_recv
cb.rcv_msg -> sk_psock_strp_read
# now stream_verdict return SK_PASS without peer sock assign
__SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
sk_psock_verdict_apply
sk_psock_skb_ingress_self
sk_psock_skb_ingress_enqueue
sk_psock_data_ready
read_lock_bh(&sk->sk_callback_lock) <= dead lock
'''
This topic has been discussed before, but it has not been fixed.
Previous discussion:
https://lore.kernel.org/all/[email protected]
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
usb: musb: Fix hardware lockup on first Rx endpoint request
There is a possibility that a request's callback could be invoked from
usb_ep_queue() (call trace below, supplemented with missing calls):
req->complete from usb_gadget_giveback_request
(drivers/usb/gadget/udc/core.c:999)
usb_gadget_giveback_request from musb_g_giveback
(drivers/usb/musb/musb_gadget.c:147)
musb_g_giveback from rxstate
(drivers/usb/musb/musb_gadget ...
In the Linux kernel, the following vulnerability has been resolved:
usb: musb: Fix hardware lockup on first Rx endpoint request
There is a possibility that a request's callback could be invoked from
usb_ep_queue() (call trace below, supplemented with missing calls):
req->complete from usb_gadget_giveback_request
(drivers/usb/gadget/udc/core.c:999)
usb_gadget_giveback_request from musb_g_giveback
(drivers/usb/musb/musb_gadget.c:147)
musb_g_giveback from rxstate
(drivers/usb/musb/musb_gadget.c:784)
rxstate from musb_ep_restart
(drivers/usb/musb/musb_gadget.c:1169)
musb_ep_restart from musb_ep_restart_resume_work
(drivers/usb/musb/musb_gadget.c:1176)
musb_ep_restart_resume_work from musb_queue_resume_work
(drivers/usb/musb/musb_core.c:2279)
musb_queue_resume_work from musb_gadget_queue
(drivers/usb/musb/musb_gadget.c:1241)
musb_gadget_queue from usb_ep_queue
(drivers/usb/gadget/udc/core.c:300)
According to the docstring of usb_ep_queue(), this should not happen:
"Note that @req's ->complete() callback must never be called from within
usb_ep_queue() as that can create deadlock situations."
In fact, a hardware lockup might occur in the following sequence:
1. The gadget is initialized using musb_gadget_enable().
2. Meanwhile, a packet arrives, and the RXPKTRDY flag is set, raising an
interrupt.
3. If IRQs are enabled, the interrupt is handled, but musb_g_rx() finds an
empty queue (next_request() returns NULL). The interrupt flag has
already been cleared by the glue layer handler, but the RXPKTRDY flag
remains set.
4. The first request is enqueued using usb_ep_queue(), leading to the call
of req->complete(), as shown in the call trace above.
5. If the callback enables IRQs and another packet is waiting, step (3)
repeats. The request queue is empty because usb_g_giveback() removes the
request before invoking the callback.
6. The endpoint remains locked up, as the interrupt triggered by hardware
setting the RXPKTRDY flag has been handled, but the flag itself remains
set.
For this scenario to occur, it is only necessary for IRQs to be enabled at
some point during the complete callback. This happens with the USB Ethernet
gadget, whose rx_complete() callback calls netif_rx(). If called in the
task context, netif_rx() disables the bottom halves (BHs). When the BHs are
re-enabled, IRQs are also enabled to allow soft IRQs to be processed. The
gadget itself is initialized at module load (or at boot if built-in), but
the first request is enqueued when the network interface is brought up,
triggering rx_complete() in the task context via ioctl(). If a packet
arrives while the interface is down, it can prevent the interface from
receiving any further packets from the USB host.
The situation is quite complicated with many parties involved. This
particular issue can be resolved in several possible ways:
1. Ensure that callbacks never enable IRQs. This would be difficult to
enforce, as discovering how netif_rx() interacts with interrupts was
already quite challenging and u_ether is not the only function driver.
Similar "bugs" could be hidden in other drivers as well.
2. Disable MUSB interrupts in musb_g_giveback() before calling the callback
and re-enable them afterwars (by calling musb_{dis,en}able_interrupts(),
for example). This would ensure that MUSB interrupts are not handled
during the callback, even if IRQs are enabled. In fact, it would allow
IRQs to be enabled when releasing the lock. However, this feels like an
inelegant hack.
3. Modify the interrupt handler to clear the RXPKTRDY flag if the request
queue is empty. While this approach also feels like a hack, it wastes
CPU time by attempting to handle incoming packets when the software is
not ready to process them.
4. Flush the Rx FIFO instead of calling rxstate() in musb_ep_restart().
This ensures that the hardware can receive packets when there is at
least one request in the queue. Once I
---truncated---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
LoongArch: Fix sleeping in atomic context for PREEMPT_RT
Commit bab1c299f3945ffe79 ("LoongArch: Fix sleeping in atomic context in
setup_tlb_handler()") changes the gfp flag from GFP_KERNEL to GFP_ATOMIC
for alloc_pages_node(). However, for PREEMPT_RT kernels we can still get
a "sleeping in atomic context" error:
[ 0.372259] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 0.372266] ...
In the Linux kernel, the following vulnerability has been resolved:
LoongArch: Fix sleeping in atomic context for PREEMPT_RT
Commit bab1c299f3945ffe79 ("LoongArch: Fix sleeping in atomic context in
setup_tlb_handler()") changes the gfp flag from GFP_KERNEL to GFP_ATOMIC
for alloc_pages_node(). However, for PREEMPT_RT kernels we can still get
a "sleeping in atomic context" error:
[ 0.372259] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 0.372266] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
[ 0.372268] preempt_count: 1, expected: 0
[ 0.372270] RCU nest depth: 1, expected: 1
[ 0.372272] 3 locks held by swapper/1/0:
[ 0.372274] #0: 900000000c9f5e60 (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x524/0x1c60
[ 0.372294] #1: 90000000087013b8 (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x50/0x140
[ 0.372305] #2: 900000047fffd388 (&zone->lock){+.+.}-{3:3}, at: __rmqueue_pcplist+0x30c/0xea0
[ 0.372314] irq event stamp: 0
[ 0.372316] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 0.372322] hardirqs last disabled at (0): [<9000000005947320>] copy_process+0x9c0/0x26e0
[ 0.372329] softirqs last enabled at (0): [<9000000005947320>] copy_process+0x9c0/0x26e0
[ 0.372335] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 0.372341] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7+ #1891
[ 0.372346] Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022
[ 0.372349] Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 9000000100388000
[ 0.372486] 900000010038b890 0000000000000000 900000010038b898 9000000007e53788
[ 0.372492] 900000000815bcc8 900000000815bcc0 900000010038b700 0000000000000001
[ 0.372498] 0000000000000001 4b031894b9d6b725 00000000055ec000 9000000100338fc0
[ 0.372503] 00000000000000c4 0000000000000001 000000000000002d 0000000000000003
[ 0.372509] 0000000000000030 0000000000000003 00000000055ec000 0000000000000003
[ 0.372515] 900000000806d000 9000000007e53788 00000000000000b0 0000000000000004
[ 0.372521] 0000000000000000 0000000000000000 900000000c9f5f10 0000000000000000
[ 0.372526] 90000000076f12d8 9000000007e53788 9000000005924778 0000000000000000
[ 0.372532] 00000000000000b0 0000000000000004 0000000000000000 0000000000070000
[ 0.372537] ...
[ 0.372540] Call Trace:
[ 0.372542] [<9000000005924778>] show_stack+0x38/0x180
[ 0.372548] [<90000000071519c4>] dump_stack_lvl+0x94/0xe4
[ 0.372555] [<900000000599b880>] __might_resched+0x1a0/0x260
[ 0.372561] [<90000000071675cc>] rt_spin_lock+0x4c/0x140
[ 0.372565] [<9000000005cbb768>] __rmqueue_pcplist+0x308/0xea0
[ 0.372570] [<9000000005cbed84>] get_page_from_freelist+0x564/0x1c60
[ 0.372575] [<9000000005cc0d98>] __alloc_pages_noprof+0x218/0x1820
[ 0.372580] [<900000000593b36c>] tlb_init+0x1ac/0x298
[ 0.372585] [<9000000005924b74>] per_cpu_trap_init+0x114/0x140
[ 0.372589] [<9000000005921964>] cpu_probe+0x4e4/0xa60
[ 0.372592] [<9000000005934874>] start_secondary+0x34/0xc0
[ 0.372599] [<900000000715615c>] smpboot_entry+0x64/0x6c
This is because in PREEMPT_RT kernels normal spinlocks are replaced by
rt spinlocks and rt_spin_lock() will cause sleeping. Fix it by disabling
NUMA optimization completely for PREEMPT_RT kernels.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
ALSA: usx2y: Use snd_card_free_when_closed() at disconnection
The USB disconnect callback is supposed to be short and not too-long
waiting. OTOH, the current code uses snd_card_free() at
disconnection, but this waits for the close of all used fds, hence it
can take long. It eventually blocks the upper layer USB ioctls, which
may trigger a soft lockup.
An easy workaround is to replace snd_card_free() with
snd_card_free_when_ ...
In the Linux kernel, the following vulnerability has been resolved:
ALSA: usx2y: Use snd_card_free_when_closed() at disconnection
The USB disconnect callback is supposed to be short and not too-long
waiting. OTOH, the current code uses snd_card_free() at
disconnection, but this waits for the close of all used fds, hence it
can take long. It eventually blocks the upper layer USB ioctls, which
may trigger a soft lockup.
An easy workaround is to replace snd_card_free() with
snd_card_free_when_closed(). This variant returns immediately while
the release of resources is done asynchronously by the card device
release at the last close.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
ALSA: us122l: Use snd_card_free_when_closed() at disconnection
The USB disconnect callback is supposed to be short and not too-long
waiting. OTOH, the current code uses snd_card_free() at
disconnection, but this waits for the close of all used fds, hence it
can take long. It eventually blocks the upper layer USB ioctls, which
may trigger a soft lockup.
An easy workaround is to replace snd_card_free() with
snd_card_free_when ...
In the Linux kernel, the following vulnerability has been resolved:
ALSA: us122l: Use snd_card_free_when_closed() at disconnection
The USB disconnect callback is supposed to be short and not too-long
waiting. OTOH, the current code uses snd_card_free() at
disconnection, but this waits for the close of all used fds, hence it
can take long. It eventually blocks the upper layer USB ioctls, which
may trigger a soft lockup.
An easy workaround is to replace snd_card_free() with
snd_card_free_when_closed(). This variant returns immediately while
the release of resources is done asynchronously by the card device
release at the last close.
The loop of us122l->mmap_count check is dropped as well. The check is
useless for the asynchronous operation with *_when_closed().
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
ALSA: caiaq: Use snd_card_free_when_closed() at disconnection
The USB disconnect callback is supposed to be short and not too-long
waiting. OTOH, the current code uses snd_card_free() at
disconnection, but this waits for the close of all used fds, hence it
can take long. It eventually blocks the upper layer USB ioctls, which
may trigger a soft lockup.
An easy workaround is to replace snd_card_free() with
snd_card_free_when_ ...
In the Linux kernel, the following vulnerability has been resolved:
ALSA: caiaq: Use snd_card_free_when_closed() at disconnection
The USB disconnect callback is supposed to be short and not too-long
waiting. OTOH, the current code uses snd_card_free() at
disconnection, but this waits for the close of all used fds, hence it
can take long. It eventually blocks the upper layer USB ioctls, which
may trigger a soft lockup.
An easy workaround is to replace snd_card_free() with
snd_card_free_when_closed(). This variant returns immediately while
the release of resources is done asynchronously by the card device
release at the last close.
This patch also splits the code to the disconnect and the free phases;
the former is called immediately at the USB disconnect callback while
the latter is called from the card destructor.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: MGMT: Fix possible deadlocks
This fixes possible deadlocks like the following caused by
hci_cmd_sync_dequeue causing the destroy function to run:
INFO: task kworker/u19:0:143 blocked for more than 120 seconds.
Tainted: G W O 6.8.0-2024-03-19-intel-next-iLS-24ww14 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u19:0 state:D stack:0 pid:143 t ...
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: MGMT: Fix possible deadlocks
This fixes possible deadlocks like the following caused by
hci_cmd_sync_dequeue causing the destroy function to run:
INFO: task kworker/u19:0:143 blocked for more than 120 seconds.
Tainted: G W O 6.8.0-2024-03-19-intel-next-iLS-24ww14 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u19:0 state:D stack:0 pid:143 tgid:143 ppid:2 flags:0x00004000
Workqueue: hci0 hci_cmd_sync_work [bluetooth]
Call Trace:
<TASK>
__schedule+0x374/0xaf0
schedule+0x3c/0xf0
schedule_preempt_disabled+0x1c/0x30
__mutex_lock.constprop.0+0x3ef/0x7a0
__mutex_lock_slowpath+0x13/0x20
mutex_lock+0x3c/0x50
mgmt_set_connectable_complete+0xa4/0x150 [bluetooth]
? kfree+0x211/0x2a0
hci_cmd_sync_dequeue+0xae/0x130 [bluetooth]
? __pfx_cmd_complete_rsp+0x10/0x10 [bluetooth]
cmd_complete_rsp+0x26/0x80 [bluetooth]
mgmt_pending_foreach+0x4d/0x70 [bluetooth]
__mgmt_power_off+0x8d/0x180 [bluetooth]
? _raw_spin_unlock_irq+0x23/0x40
hci_dev_close_sync+0x445/0x5b0 [bluetooth]
hci_set_powered_sync+0x149/0x250 [bluetooth]
set_powered_sync+0x24/0x60 [bluetooth]
hci_cmd_sync_work+0x90/0x150 [bluetooth]
process_one_work+0x13e/0x300
worker_thread+0x2f7/0x420
? __pfx_worker_thread+0x10/0x10
kthread+0x107/0x140
? __pfx_kthread+0x10/0x10
ret_from_fork+0x3d/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
dma-debug: fix a possible deadlock on radix_lock
radix_lock() shouldn't be held while holding dma_hash_entry[idx].lock
otherwise, there's a possible deadlock scenario when
dma debug API is called holding rq_lock():
CPU0 CPU1 CPU2
dma_free_attrs()
check_unmap() add_dma_entry() __schedule() //out
(A) rq_lock()
get_hash_ ...
In the Linux kernel, the following vulnerability has been resolved:
dma-debug: fix a possible deadlock on radix_lock
radix_lock() shouldn't be held while holding dma_hash_entry[idx].lock
otherwise, there's a possible deadlock scenario when
dma debug API is called holding rq_lock():
CPU0 CPU1 CPU2
dma_free_attrs()
check_unmap() add_dma_entry() __schedule() //out
(A) rq_lock()
get_hash_bucket()
(A) dma_entry_hash
check_sync()
(A) radix_lock() (W) dma_entry_hash
dma_entry_free()
(W) radix_lock()
// CPU2's one
(W) rq_lock()
CPU1 situation can happen when it extending radix tree and
it tries to wake up kswapd via wake_all_kswapd().
CPU2 situation can happen while perf_event_task_sched_out()
(i.e. dma sync operation is called while deleting perf_event using
etm and etr tmc which are Arm Coresight hwtracing driver backends).
To remove this possible situation, call dma_entry_free() after
put_hash_bucket() in check_unmap().
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
i3c: Use i3cdev->desc->info instead of calling i3c_device_get_info() to avoid deadlock
A deadlock may happen since the i3c_master_register() acquires
&i3cbus->lock twice. See the log below.
Use i3cdev->desc->info instead of calling i3c_device_info() to
avoid acquiring the lock twice.
v2:
- Modified the title and commit message
============================================
WARNING: possible recursive locking detected
6.11.0- ...
In the Linux kernel, the following vulnerability has been resolved:
i3c: Use i3cdev->desc->info instead of calling i3c_device_get_info() to avoid deadlock
A deadlock may happen since the i3c_master_register() acquires
&i3cbus->lock twice. See the log below.
Use i3cdev->desc->info instead of calling i3c_device_info() to
avoid acquiring the lock twice.
v2:
- Modified the title and commit message
============================================
WARNING: possible recursive locking detected
6.11.0-mainline
--------------------------------------------
init/1 is trying to acquire lock:
f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_bus_normaluse_lock
but task is already holding lock:
f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&i3cbus->lock);
lock(&i3cbus->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by init/1:
#0: fcffff809b6798f8 (&dev->mutex){....}-{3:3}, at: __driver_attach
#1: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register
stack backtrace:
CPU: 6 UID: 0 PID: 1 Comm: init
Call trace:
dump_backtrace+0xfc/0x17c
show_stack+0x18/0x28
dump_stack_lvl+0x40/0xc0
dump_stack+0x18/0x24
print_deadlock_bug+0x388/0x390
__lock_acquire+0x18bc/0x32ec
lock_acquire+0x134/0x2b0
down_read+0x50/0x19c
i3c_bus_normaluse_lock+0x14/0x24
i3c_device_get_info+0x24/0x58
i3c_device_uevent+0x34/0xa4
dev_uevent+0x310/0x384
kobject_uevent_env+0x244/0x414
kobject_uevent+0x14/0x20
device_add+0x278/0x460
device_register+0x20/0x34
i3c_master_register_new_i3c_devs+0x78/0x154
i3c_master_register+0x6a0/0x6d4
mtk_i3c_master_probe+0x3b8/0x4d8
platform_probe+0xa0/0xe0
really_probe+0x114/0x454
__driver_probe_device+0xa0/0x15c
driver_probe_device+0x3c/0x1ac
__driver_attach+0xc4/0x1f0
bus_for_each_dev+0x104/0x160
driver_attach+0x24/0x34
bus_add_driver+0x14c/0x294
driver_register+0x68/0x104
__platform_driver_register+0x20/0x30
init_module+0x20/0xfe4
do_one_initcall+0x184/0x464
do_init_module+0x58/0x1ec
load_module+0xefc/0x10c8
__arm64_sys_finit_module+0x238/0x33c
invoke_syscall+0x58/0x10c
el0_svc_common+0xa8/0xdc
do_el0_svc+0x1c/0x28
el0_svc+0x50/0xac
el0t_64_sync_handler+0x70/0xbc
el0t_64_sync+0x1a8/0x1ac
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
exfat: fix potential deadlock on __exfat_get_dentry_set
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array
is allocated in __exfat_get_entry_set. The problem is that the bh-array is
allocated with GFP_KERNEL. It does not make sense. In the following cases,
a deadlock for sbi->s_lock between the two processes may occur.
CPU0 CPU1
---- ----
kswapd
balance ...
In the Linux kernel, the following vulnerability has been resolved:
exfat: fix potential deadlock on __exfat_get_dentry_set
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array
is allocated in __exfat_get_entry_set. The problem is that the bh-array is
allocated with GFP_KERNEL. It does not make sense. In the following cases,
a deadlock for sbi->s_lock between the two processes may occur.
CPU0 CPU1
---- ----
kswapd
balance_pgdat
lock(fs_reclaim)
exfat_iterate
lock(&sbi->s_lock)
exfat_readdir
exfat_get_uniname_from_ext_entry
exfat_get_dentry_set
__exfat_get_dentry_set
kmalloc_array
...
lock(fs_reclaim)
...
evict
exfat_evict_inode
lock(&sbi->s_lock)
To fix this, let's allocate bh-array with GFP_NOFS.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
RDMA/hns: Fix deadlock on SRQ async events.
xa_lock for SRQ table may be required in AEQ. Use xa_store_irq()/
xa_erase_irq() to avoid deadlock.
|
In the Linux kernel, the following vulnerability has been resolved:
soc: qcom: pdr: Fix the potential deadlock
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the respo ...
In the Linux kernel, the following vulnerability has been resolved:
soc: qcom: pdr: Fix the potential deadlock
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.
Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.
Process A Process B
process_scheduled_works()
pdr_add_lookup() qmi_data_ready_work()
process_scheduled_works() pdr_locator_new_server()
pdr->locator_init_complete=true;
pdr_locator_work()
mutex_lock(&pdr->list_lock);
pdr_locate_service() mutex_lock(&pdr->list_lock);
pdr_get_domain_list()
pr_err("PDR: %s get domain list
txn wait failed: %d\n",
req->service_name,
ret);
Timeout error log due to deadlock:
"
PDR: tms/servreg get domain list txn wait failed: -110
PDR: service lookup for msm/adsp/sensor_pd:tms/servreg failed: -110
"
Thanks to Bjorn and Johan for letting me know that this commit also fixes
an audio regression when using the in-kernel pd-mapper as that makes it
easier to hit this race. [1]
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
RDMA/hns: Fix soft lockup during bt pages loop
Driver runs a for-loop when allocating bt pages and mapping them with
buffer pages. When a large buffer (e.g. MR over 100GB) is being allocated,
it may require a considerable loop count. This will lead to soft lockup:
watchdog: BUG: soft lockup - CPU#27 stuck for 22s!
...
Call trace:
hem_list_alloc_mid_bt+0x124/0x394 [hns_roce_hw_v2]
hns_ ...
In the Linux kernel, the following vulnerability has been resolved:
RDMA/hns: Fix soft lockup during bt pages loop
Driver runs a for-loop when allocating bt pages and mapping them with
buffer pages. When a large buffer (e.g. MR over 100GB) is being allocated,
it may require a considerable loop count. This will lead to soft lockup:
watchdog: BUG: soft lockup - CPU#27 stuck for 22s!
...
Call trace:
hem_list_alloc_mid_bt+0x124/0x394 [hns_roce_hw_v2]
hns_roce_hem_list_request+0xf8/0x160 [hns_roce_hw_v2]
hns_roce_mtr_create+0x2e4/0x360 [hns_roce_hw_v2]
alloc_mr_pbl+0xd4/0x17c [hns_roce_hw_v2]
hns_roce_reg_user_mr+0xf8/0x190 [hns_roce_hw_v2]
ib_uverbs_reg_mr+0x118/0x290
watchdog: BUG: soft lockup - CPU#35 stuck for 23s!
...
Call trace:
hns_roce_hem_list_find_mtt+0x7c/0xb0 [hns_roce_hw_v2]
mtr_map_bufs+0xc4/0x204 [hns_roce_hw_v2]
hns_roce_mtr_create+0x31c/0x3c4 [hns_roce_hw_v2]
alloc_mr_pbl+0xb0/0x160 [hns_roce_hw_v2]
hns_roce_reg_user_mr+0x108/0x1c0 [hns_roce_hw_v2]
ib_uverbs_reg_mr+0x120/0x2bc
Add a cond_resched() to fix soft lockup during these loops. In order not
to affect the allocation performance of normal-size buffer, set the loop
count of a 100GB MR as the threshold to call cond_resched().
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
net: switchdev: Convert blocking notification chain to a raw one
A blocking notification chain uses a read-write semaphore to protect the
integrity of the chain. The semaphore is acquired for writing when
adding / removing notifiers to / from the chain and acquired for reading
when traversing the chain and informing notifiers about an event.
In case of the blocking switchdev notification chain, recursive
notifications are pos ...
In the Linux kernel, the following vulnerability has been resolved:
net: switchdev: Convert blocking notification chain to a raw one
A blocking notification chain uses a read-write semaphore to protect the
integrity of the chain. The semaphore is acquired for writing when
adding / removing notifiers to / from the chain and acquired for reading
when traversing the chain and informing notifiers about an event.
In case of the blocking switchdev notification chain, recursive
notifications are possible which leads to the semaphore being acquired
twice for reading and to lockdep warnings being generated [1].
Specifically, this can happen when the bridge driver processes a
SWITCHDEV_BRPORT_UNOFFLOADED event which causes it to emit notifications
about deferred events when calling switchdev_deferred_process().
Fix this by converting the notification chain to a raw notification
chain in a similar fashion to the netdev notification chain. Protect
the chain using the RTNL mutex by acquiring it when modifying the chain.
Events are always informed under the RTNL mutex, but add an assertion in
call_switchdev_blocking_notifiers() to make sure this is not violated in
the future.
Maintain the "blocking" prefix as events are always emitted from process
context and listeners are allowed to block.
[1]:
WARNING: possible recursive locking detected
6.14.0-rc4-custom-g079270089484 #1 Not tainted
--------------------------------------------
ip/52731 is trying to acquire lock:
ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0
but task is already holding lock:
ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock((switchdev_blocking_notif_chain).rwsem);
lock((switchdev_blocking_notif_chain).rwsem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by ip/52731:
#0: ffffffff84f795b0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x727/0x1dc0
#1: ffffffff8731f628 (&net->rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x790/0x1dc0
#2: ffffffff850918d8 ((switchdev_blocking_notif_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x58/0xa0
stack backtrace:
...
? __pfx_down_read+0x10/0x10
? __pfx_mark_lock+0x10/0x10
? __pfx_switchdev_port_attr_set_deferred+0x10/0x10
blocking_notifier_call_chain+0x58/0xa0
switchdev_port_attr_notify.constprop.0+0xb3/0x1b0
? __pfx_switchdev_port_attr_notify.constprop.0+0x10/0x10
? mark_held_locks+0x94/0xe0
? switchdev_deferred_process+0x11a/0x340
switchdev_port_attr_set_deferred+0x27/0xd0
switchdev_deferred_process+0x164/0x340
br_switchdev_port_unoffload+0xc8/0x100 [bridge]
br_switchdev_blocking_event+0x29f/0x580 [bridge]
notifier_call_chain+0xa2/0x440
blocking_notifier_call_chain+0x6e/0xa0
switchdev_bridge_port_unoffload+0xde/0x1a0
...
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock
There are multiple places from where the recovery work gets scheduled
asynchronously. Also, there are multiple places where the caller waits
synchronously for the recovery to be completed. One such place is during
the PM shutdown() callback.
If the device is not alive during recovery_work, it will try to reset the
device using pci_reset_function(). Th ...
In the Linux kernel, the following vulnerability has been resolved:
bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock
There are multiple places from where the recovery work gets scheduled
asynchronously. Also, there are multiple places where the caller waits
synchronously for the recovery to be completed. One such place is during
the PM shutdown() callback.
If the device is not alive during recovery_work, it will try to reset the
device using pci_reset_function(). This function internally will take the
device_lock() first before resetting the device. By this time, if the lock
has already been acquired, then recovery_work will get stalled while
waiting for the lock. And if the lock was already acquired by the caller
which waits for the recovery_work to be completed, it will lead to
deadlock.
This is what happened on the X1E80100 CRD device when the device died
before shutdown() callback. Driver core calls the driver's shutdown()
callback while holding the device_lock() leading to deadlock.
And this deadlock scenario can occur on other paths as well, like during
the PM suspend() callback, where the driver core would hold the
device_lock() before calling driver's suspend() callback. And if the
recovery_work was already started, it could lead to deadlock. This is also
observed on the X1E80100 CRD.
So to fix both issues, use pci_try_reset_function() in recovery_work. This
function first checks for the availability of the device_lock() before
trying to reset the device. If the lock is available, it will acquire it
and reset the device. Otherwise, it will return -EAGAIN. If that happens,
recovery_work will fail with the error message "Recovery failed" as not
much could be done.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix bug on trap in smb2_lock
If lock count is greater than 1, flags could be old value.
It should be checked with flags of smb_lock, not flags.
It will cause bug-on trap from locks_free_lock in error handling
routine.
|
In the Linux kernel, the following vulnerability has been resolved:
hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produ ...
In the Linux kernel, the following vulnerability has been resolved:
hwpoison, memory_hotplug: lock folio before unmap hwpoisoned folio
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------
kernel BUG at ./include/linux/swapops.h:400!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0xb08/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0xb08/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
---[ end trace 0000000000000000 ]---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
gpio: rcar: Use raw_spinlock to protect register access
Use raw_spinlock in order to fix spurious messages about invalid context
when spinlock debugging is enabled. The lock is only used to serialize
register access.
[ 4.239592] =============================
[ 4.239595] [ BUG: Invalid wait context ]
[ 4.239599] 6.13.0-rc7-arm64-renesas-05496-gd088502a519f #35 Not tainted
[ 4.239603] --------------- ...
In the Linux kernel, the following vulnerability has been resolved:
gpio: rcar: Use raw_spinlock to protect register access
Use raw_spinlock in order to fix spurious messages about invalid context
when spinlock debugging is enabled. The lock is only used to serialize
register access.
[ 4.239592] =============================
[ 4.239595] [ BUG: Invalid wait context ]
[ 4.239599] 6.13.0-rc7-arm64-renesas-05496-gd088502a519f #35 Not tainted
[ 4.239603] -----------------------------
[ 4.239606] kworker/u8:5/76 is trying to lock:
[ 4.239609] ffff0000091898a0 (&p->lock){....}-{3:3}, at: gpio_rcar_config_interrupt_input_mode+0x34/0x164
[ 4.239641] other info that might help us debug this:
[ 4.239643] context-{5:5}
[ 4.239646] 5 locks held by kworker/u8:5/76:
[ 4.239651] #0: ffff0000080fb148 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x190/0x62c
[ 4.250180] OF: /soc/sound@ec500000/ports/port@0/endpoint: Read of boolean property 'frame-master' with a value.
[ 4.254094] #1: ffff80008299bd80 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1b8/0x62c
[ 4.254109] #2: ffff00000920c8f8
[ 4.258345] OF: /soc/sound@ec500000/ports/port@1/endpoint: Read of boolean property 'bitclock-master' with a value.
[ 4.264803] (&dev->mutex){....}-{4:4}, at: __device_attach_async_helper+0x3c/0xdc
[ 4.264820] #3: ffff00000a50ca40 (request_class#2){+.+.}-{4:4}, at: __setup_irq+0xa0/0x690
[ 4.264840] #4:
[ 4.268872] OF: /soc/sound@ec500000/ports/port@1/endpoint: Read of boolean property 'frame-master' with a value.
[ 4.273275] ffff00000a50c8c8 (lock_class){....}-{2:2}, at: __setup_irq+0xc4/0x690
[ 4.296130] renesas_sdhi_internal_dmac ee100000.mmc: mmc1 base at 0x00000000ee100000, max clock rate 200 MHz
[ 4.304082] stack backtrace:
[ 4.304086] CPU: 1 UID: 0 PID: 76 Comm: kworker/u8:5 Not tainted 6.13.0-rc7-arm64-renesas-05496-gd088502a519f #35
[ 4.304092] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[ 4.304097] Workqueue: async async_run_entry_fn
[ 4.304106] Call trace:
[ 4.304110] show_stack+0x14/0x20 (C)
[ 4.304122] dump_stack_lvl+0x6c/0x90
[ 4.304131] dump_stack+0x14/0x1c
[ 4.304138] __lock_acquire+0xdfc/0x1584
[ 4.426274] lock_acquire+0x1c4/0x33c
[ 4.429942] _raw_spin_lock_irqsave+0x5c/0x80
[ 4.434307] gpio_rcar_config_interrupt_input_mode+0x34/0x164
[ 4.440061] gpio_rcar_irq_set_type+0xd4/0xd8
[ 4.444422] __irq_set_trigger+0x5c/0x178
[ 4.448435] __setup_irq+0x2e4/0x690
[ 4.452012] request_threaded_irq+0xc4/0x190
[ 4.456285] devm_request_threaded_irq+0x7c/0xf4
[ 4.459398] ata1: link resume succeeded after 1 retries
[ 4.460902] mmc_gpiod_request_cd_irq+0x68/0xe0
[ 4.470660] mmc_start_host+0x50/0xac
[ 4.474327] mmc_add_host+0x80/0xe4
[ 4.477817] tmio_mmc_host_probe+0x2b0/0x440
[ 4.482094] renesas_sdhi_probe+0x488/0x6f4
[ 4.486281] renesas_sdhi_internal_dmac_probe+0x60/0x78
[ 4.491509] platform_probe+0x64/0xd8
[ 4.495178] really_probe+0xb8/0x2a8
[ 4.498756] __driver_probe_device+0x74/0x118
[ 4.503116] driver_probe_device+0x3c/0x154
[ 4.507303] __device_attach_driver+0xd4/0x160
[ 4.511750] bus_for_each_drv+0x84/0xe0
[ 4.515588] __device_attach_async_helper+0xb0/0xdc
[ 4.520470] async_run_entry_fn+0x30/0xd8
[ 4.524481] process_one_work+0x210/0x62c
[ 4.528494] worker_thread+0x1ac/0x340
[ 4.532245] kthread+0x10c/0x110
[ 4.535476] ret_from_fork+0x10/0x20
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
i2c: npcm: disable interrupt enable bit before devm_request_irq
The customer reports that there is a soft lockup issue related to
the i2c driver. After checking, the i2c module was doing a tx transfer
and the bmc machine reboots in the middle of the i2c transaction, the i2c
module keeps the status without being reset.
Due to such an i2c module status, the i2c irq handler keeps getting
triggered since the i2c irq handler is re ...
In the Linux kernel, the following vulnerability has been resolved:
i2c: npcm: disable interrupt enable bit before devm_request_irq
The customer reports that there is a soft lockup issue related to
the i2c driver. After checking, the i2c module was doing a tx transfer
and the bmc machine reboots in the middle of the i2c transaction, the i2c
module keeps the status without being reset.
Due to such an i2c module status, the i2c irq handler keeps getting
triggered since the i2c irq handler is registered in the kernel booting
process after the bmc machine is doing a warm rebooting.
The continuous triggering is stopped by the soft lockup watchdog timer.
Disable the interrupt enable bit in the i2c module before calling
devm_request_irq to fix this issue since the i2c relative status bit
is read-only.
Here is the soft lockup log.
[ 28.176395] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:1]
[ 28.183351] Modules linked in:
[ 28.186407] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.120-yocto-s-dirty-bbebc78 #1
[ 28.201174] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 28.208128] pc : __do_softirq+0xb0/0x368
[ 28.212055] lr : __do_softirq+0x70/0x368
[ 28.215972] sp : ffffff8035ebca00
[ 28.219278] x29: ffffff8035ebca00 x28: 0000000000000002 x27: ffffff80071a3780
[ 28.226412] x26: ffffffc008bdc000 x25: ffffffc008bcc640 x24: ffffffc008be50c0
[ 28.233546] x23: ffffffc00800200c x22: 0000000000000000 x21: 000000000000001b
[ 28.240679] x20: 0000000000000000 x19: ffffff80001c3200 x18: ffffffffffffffff
[ 28.247812] x17: ffffffc02d2e0000 x16: ffffff8035eb8b40 x15: 00001e8480000000
[ 28.254945] x14: 02c3647e37dbfcb6 x13: 02c364f2ab14200c x12: 0000000002c364f2
[ 28.262078] x11: 00000000fa83b2da x10: 000000000000b67e x9 : ffffffc008010250
[ 28.269211] x8 : 000000009d983d00 x7 : 7fffffffffffffff x6 : 0000036d74732434
[ 28.276344] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : 0000000000000198
[ 28.283476] x2 : ffffffc02d2e0000 x1 : 00000000000000e0 x0 : ffffffc008bdcb40
[ 28.290611] Call trace:
[ 28.293052] __do_softirq+0xb0/0x368
[ 28.296625] __irq_exit_rcu+0xe0/0x100
[ 28.300374] irq_exit+0x14/0x20
[ 28.303513] handle_domain_irq+0x68/0x90
[ 28.307440] gic_handle_irq+0x78/0xb0
[ 28.311098] call_on_irq_stack+0x20/0x38
[ 28.315019] do_interrupt_handler+0x54/0x5c
[ 28.319199] el1_interrupt+0x2c/0x4c
[ 28.322777] el1h_64_irq_handler+0x14/0x20
[ 28.326872] el1h_64_irq+0x74/0x78
[ 28.330269] __setup_irq+0x454/0x780
[ 28.333841] request_threaded_irq+0xd0/0x1b4
[ 28.338107] devm_request_threaded_irq+0x84/0x100
[ 28.342809] npcm_i2c_probe_bus+0x188/0x3d0
[ 28.346990] platform_probe+0x6c/0xc4
[ 28.350653] really_probe+0xcc/0x45c
[ 28.354227] __driver_probe_device+0x8c/0x160
[ 28.358578] driver_probe_device+0x44/0xe0
[ 28.362670] __driver_attach+0x124/0x1d0
[ 28.366589] bus_for_each_dev+0x7c/0xe0
[ 28.370426] driver_attach+0x28/0x30
[ 28.373997] bus_add_driver+0x124/0x240
[ 28.377830] driver_register+0x7c/0x124
[ 28.381662] __platform_driver_register+0x2c/0x34
[ 28.386362] npcm_i2c_init+0x3c/0x5c
[ 28.389937] do_one_initcall+0x74/0x230
[ 28.393768] kernel_init_freeable+0x24c/0x2b4
[ 28.398126] kernel_init+0x28/0x130
[ 28.401614] ret_from_fork+0x10/0x20
[ 28.405189] Kernel panic - not syncing: softlockup: hung tasks
[ 28.411011] SMP: stopping secondary CPUs
[ 28.414933] Kernel Offset: disabled
[ 28.418412] CPU features: 0x00000000,00000802
[ 28.427644] Rebooting in 20 seconds..
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
USB: gadget: f_midi: f_midi_complete to call queue_work
When using USB MIDI, a lock is attempted to be acquired twice through a
re-entrant call to f_midi_transmit, causing a deadlock.
Fix it by using queue_work() to schedule the inner f_midi_transmit() via
a high priority work queue from the completion handler.
|
In the Linux kernel, the following vulnerability has been resolved:
clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context
The following bug report happened with a PREEMPT_RT kernel:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
get_random_u32+0x4f/0x110
clocksource_verify_c ...
In the Linux kernel, the following vulnerability has been resolved:
clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context
The following bug report happened with a PREEMPT_RT kernel:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
get_random_u32+0x4f/0x110
clocksource_verify_choose_cpus+0xab/0x1a0
clocksource_verify_percpu.part.0+0x6b/0x330
clocksource_watchdog_kthread+0x193/0x1a0
It is due to the fact that clocksource_verify_choose_cpus() is invoked with
preemption disabled. This function invokes get_random_u32() to obtain
random numbers for choosing CPUs. The batched_entropy_32 local lock and/or
the base_crng.lock spinlock in driver/char/random.c will be acquired during
the call. In PREEMPT_RT kernel, they are both sleeping locks and so cannot
be acquired in atomic context.
Fix this problem by using migrate_disable() to allow smp_processor_id() to
be reliably used without introducing atomic context. preempt_disable() is
then called after clocksource_verify_choose_cpus() but before the
clocksource measurement is being run to avoid introducing unexpected
latency.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
net: rose: lock the socket in rose_bind()
syzbot reported a soft lockup in rose_loopback_timer(),
with a repro calling bind() from multiple threads.
rose_bind() must lock the socket to avoid this issue.
|
In the Linux kernel, the following vulnerability has been resolved:
gpio: xilinx: Convert gpio_lock to raw spinlock
irq_chip functions may be called in raw spinlock context. Therefore, we
must also use a raw spinlock for our own internal locking.
This fixes the following lockdep splat:
[ 5.349336] =============================
[ 5.353349] [ BUG: Invalid wait context ]
[ 5.357361] 6.13.0-rc5+ #69 Tainted: G W
[ 5.363031] -----------------------------
[ 5.367045] kworker/ ...
In the Linux kernel, the following vulnerability has been resolved:
gpio: xilinx: Convert gpio_lock to raw spinlock
irq_chip functions may be called in raw spinlock context. Therefore, we
must also use a raw spinlock for our own internal locking.
This fixes the following lockdep splat:
[ 5.349336] =============================
[ 5.353349] [ BUG: Invalid wait context ]
[ 5.357361] 6.13.0-rc5+ #69 Tainted: G W
[ 5.363031] -----------------------------
[ 5.367045] kworker/u17:1/44 is trying to lock:
[ 5.371587] ffffff88018b02c0 (&chip->gpio_lock){....}-{3:3}, at: xgpio_irq_unmask (drivers/gpio/gpio-xilinx.c:433 (discriminator 8))
[ 5.380079] other info that might help us debug this:
[ 5.385138] context-{5:5}
[ 5.387762] 5 locks held by kworker/u17:1/44:
[ 5.392123] #0: ffffff8800014958 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3204)
[ 5.402260] #1: ffffffc082fcbdd8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:3205)
[ 5.411528] #2: ffffff880172c900 (&dev->mutex){....}-{4:4}, at: __device_attach (drivers/base/dd.c:1006)
[ 5.419929] #3: ffffff88039c8268 (request_class#2){+.+.}-{4:4}, at: __setup_irq (kernel/irq/internals.h:156 kernel/irq/manage.c:1596)
[ 5.428331] #4: ffffff88039c80c8 (lock_class#2){....}-{2:2}, at: __setup_irq (kernel/irq/manage.c:1614)
[ 5.436472] stack backtrace:
[ 5.439359] CPU: 2 UID: 0 PID: 44 Comm: kworker/u17:1 Tainted: G W 6.13.0-rc5+ #69
[ 5.448690] Tainted: [W]=WARN
[ 5.451656] Hardware name: xlnx,zynqmp (DT)
[ 5.455845] Workqueue: events_unbound deferred_probe_work_func
[ 5.461699] Call trace:
[ 5.464147] show_stack+0x18/0x24 C
[ 5.467821] dump_stack_lvl (lib/dump_stack.c:123)
[ 5.471501] dump_stack (lib/dump_stack.c:130)
[ 5.474824] __lock_acquire (kernel/locking/lockdep.c:4828 kernel/locking/lockdep.c:4898 kernel/locking/lockdep.c:5176)
[ 5.478758] lock_acquire (arch/arm64/include/asm/percpu.h:40 kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5851 kernel/locking/lockdep.c:5814)
[ 5.482429] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 5.486797] xgpio_irq_unmask (drivers/gpio/gpio-xilinx.c:433 (discriminator 8))
[ 5.490737] irq_enable (kernel/irq/internals.h:236 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345)
[ 5.494060] __irq_startup (kernel/irq/internals.h:241 kernel/irq/chip.c:180 kernel/irq/chip.c:250)
[ 5.497645] irq_startup (kernel/irq/chip.c:270)
[ 5.501143] __setup_irq (kernel/irq/manage.c:1807)
[ 5.504728] request_threaded_irq (kernel/irq/manage.c:2208)
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
team: prevent adding a device which is already a team device lower
Prevent adding a device which is already a team device lower,
e.g. adding veth0 if vlan1 was already added and veth0 is a lower of
vlan1.
This is not useful in practice and can lead to recursive locking:
$ ip link add veth0 type veth peer name veth1
$ ip link set veth0 up
$ ip link set veth1 up
$ ip link add link veth0 name veth0.1 type vlan protocol 802.1Q i ...
In the Linux kernel, the following vulnerability has been resolved:
team: prevent adding a device which is already a team device lower
Prevent adding a device which is already a team device lower,
e.g. adding veth0 if vlan1 was already added and veth0 is a lower of
vlan1.
This is not useful in practice and can lead to recursive locking:
$ ip link add veth0 type veth peer name veth1
$ ip link set veth0 up
$ ip link set veth1 up
$ ip link add link veth0 name veth0.1 type vlan protocol 802.1Q id 1
$ ip link add team0 type team
$ ip link set veth0.1 down
$ ip link set veth0.1 master team0
team0: Port device veth0.1 added
$ ip link set veth0 down
$ ip link set veth0 master team0
============================================
WARNING: possible recursive locking detected
6.13.0-rc2-virtme-00441-ga14a429069bb #46 Not tainted
--------------------------------------------
ip/7684 is trying to acquire lock:
ffff888016848e00 (team->team_lock_key){+.+.}-{4:4}, at: team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
but task is already holding lock:
ffff888016848e00 (team->team_lock_key){+.+.}-{4:4}, at: team_add_slave (drivers/net/team/team_core.c:1147 drivers/net/team/team_core.c:1977)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(team->team_lock_key);
lock(team->team_lock_key);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by ip/7684:
stack backtrace:
CPU: 3 UID: 0 PID: 7684 Comm: ip Not tainted 6.13.0-rc2-virtme-00441-ga14a429069bb #46
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:122)
print_deadlock_bug.cold (kernel/locking/lockdep.c:3040)
__lock_acquire (kernel/locking/lockdep.c:3893 kernel/locking/lockdep.c:5226)
? netlink_broadcast_filtered (net/netlink/af_netlink.c:1548)
lock_acquire.part.0 (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5851)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? trace_lock_acquire (./include/trace/events/lock.h:24 (discriminator 2))
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? lock_acquire (kernel/locking/lockdep.c:5822)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
__mutex_lock (kernel/locking/mutex.c:587 kernel/locking/mutex.c:735)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
? fib_sync_up (net/ipv4/fib_semantics.c:2167)
? team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
team_device_event (drivers/net/team/team_core.c:2928 drivers/net/team/team_core.c:2951 drivers/net/team/team_core.c:2973)
notifier_call_chain (kernel/notifier.c:85)
call_netdevice_notifiers_info (net/core/dev.c:1996)
__dev_notify_flags (net/core/dev.c:8993)
? __dev_change_flags (net/core/dev.c:8975)
dev_change_flags (net/core/dev.c:9027)
vlan_device_event (net/8021q/vlan.c:85 net/8021q/vlan.c:470)
? br_device_event (net/bridge/br.c:143)
notifier_call_chain (kernel/notifier.c:85)
call_netdevice_notifiers_info (net/core/dev.c:1996)
dev_open (net/core/dev.c:1519 net/core/dev.c:1505)
team_add_slave (drivers/net/team/team_core.c:1219 drivers/net/team/team_core.c:1977)
? __pfx_team_add_slave (drivers/net/team/team_core.c:1972)
do_set_master (net/core/rtnetlink.c:2917)
do_setlink.isra.0 (net/core/rtnetlink.c:3117)
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
memcg: fix soft lockup in the OOM process
A soft lockup issue was found in the product with about 56,000 tasks were
in the OOM cgroup, it was traversing them when the soft lockup was
triggered.
watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
Hardware name: Huawei Cloud OpenStack Nova, BIOS
RIP: 0010:console_unlock+0x343/0x540
RSP: 0000:ffffb751 ...
In the Linux kernel, the following vulnerability has been resolved:
memcg: fix soft lockup in the OOM process
A soft lockup issue was found in the product with about 56,000 tasks were
in the OOM cgroup, it was traversing them when the soft lockup was
triggered.
watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
Hardware name: Huawei Cloud OpenStack Nova, BIOS
RIP: 0010:console_unlock+0x343/0x540
RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
vprintk_emit+0x193/0x280
printk+0x52/0x6e
dump_task+0x114/0x130
mem_cgroup_scan_tasks+0x76/0x100
dump_header+0x1fe/0x210
oom_kill_process+0xd1/0x100
out_of_memory+0x125/0x570
mem_cgroup_out_of_memory+0xb5/0xd0
try_charge+0x720/0x770
mem_cgroup_try_charge+0x86/0x180
mem_cgroup_try_charge_delay+0x1c/0x40
do_anonymous_page+0xb5/0x390
handle_mm_fault+0xc4/0x1f0
This is because thousands of processes are in the OOM cgroup, it takes a
long time to traverse all of them. As a result, this lead to soft lockup
in the OOM process.
To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
function per 1000 iterations. For global OOM, call
'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
Show More
|