| CVE |
Vendors |
Products |
Updated |
CVSS v2 |
CVSS v3 |
In the Linux kernel, the following vulnerability has been resolved:
x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
The .compat section is a dummy PE section that contains the address of
the 32-bit entrypoint of the 64-bit kernel image if it is bootable from
32-bit firmware (i.e., CONFIG_EFI_MIXED=y)
This section is only 8 bytes in size and is only referenced from the
loader, and so it is placed at the end of the memory view of the image,
to avoid the need for padding it ...
In the Linux kernel, the following vulnerability has been resolved:
x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
The .compat section is a dummy PE section that contains the address of
the 32-bit entrypoint of the 64-bit kernel image if it is bootable from
32-bit firmware (i.e., CONFIG_EFI_MIXED=y)
This section is only 8 bytes in size and is only referenced from the
loader, and so it is placed at the end of the memory view of the image,
to avoid the need for padding it to 4k, which is required for sections
appearing in the middle of the image.
Unfortunately, this violates the PE/COFF spec, and even if most EFI
loaders will work correctly (including the Tianocore reference
implementation), PE loaders do exist that reject such images, on the
basis that both the file and memory views of the file contents should be
described by the section headers in a monotonically increasing manner
without leaving any gaps.
So reorganize the sections to avoid this issue. This results in a slight
padding overhead (< 4k) which can be avoided if desired by disabling
CONFIG_EFI_MIXED (which is only needed in rare cases these days)
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
inet: read sk->sk_family once in inet_recv_error()
inet_recv_error() is called without holding the socket lock.
IPv6 socket could mutate to IPv4 with IPV6_ADDRFORM
socket option and trigger a KCSAN warning.
|
In the Linux kernel, the following vulnerability has been resolved:
net: atlantic: Fix DMA mapping for PTP hwts ring
Function aq_ring_hwts_rx_alloc() maps extra AQ_CFG_RXDS_DEF bytes
for PTP HWTS ring but then generic aq_ring_free() does not take this
into account.
Create and use a specific function to free HWTS ring to fix this
issue.
Trace:
[ 215.351607] ------------[ cut here ]------------
[ 215.351612] DMA-API: atlantic 0000:4b:00.0: device driver frees DMA memory with different size [d ...
In the Linux kernel, the following vulnerability has been resolved:
net: atlantic: Fix DMA mapping for PTP hwts ring
Function aq_ring_hwts_rx_alloc() maps extra AQ_CFG_RXDS_DEF bytes
for PTP HWTS ring but then generic aq_ring_free() does not take this
into account.
Create and use a specific function to free HWTS ring to fix this
issue.
Trace:
[ 215.351607] ------------[ cut here ]------------
[ 215.351612] DMA-API: atlantic 0000:4b:00.0: device driver frees DMA memory with different size [device address=0x00000000fbdd0000] [map size=34816 bytes] [unmap size=32768 bytes]
[ 215.351635] WARNING: CPU: 33 PID: 10759 at kernel/dma/debug.c:988 check_unmap+0xa6f/0x2360
...
[ 215.581176] Call Trace:
[ 215.583632] <TASK>
[ 215.585745] ? show_trace_log_lvl+0x1c4/0x2df
[ 215.590114] ? show_trace_log_lvl+0x1c4/0x2df
[ 215.594497] ? debug_dma_free_coherent+0x196/0x210
[ 215.599305] ? check_unmap+0xa6f/0x2360
[ 215.603147] ? __warn+0xca/0x1d0
[ 215.606391] ? check_unmap+0xa6f/0x2360
[ 215.610237] ? report_bug+0x1ef/0x370
[ 215.613921] ? handle_bug+0x3c/0x70
[ 215.617423] ? exc_invalid_op+0x14/0x50
[ 215.621269] ? asm_exc_invalid_op+0x16/0x20
[ 215.625480] ? check_unmap+0xa6f/0x2360
[ 215.629331] ? mark_lock.part.0+0xca/0xa40
[ 215.633445] debug_dma_free_coherent+0x196/0x210
[ 215.638079] ? __pfx_debug_dma_free_coherent+0x10/0x10
[ 215.643242] ? slab_free_freelist_hook+0x11d/0x1d0
[ 215.648060] dma_free_attrs+0x6d/0x130
[ 215.651834] aq_ring_free+0x193/0x290 [atlantic]
[ 215.656487] aq_ptp_ring_free+0x67/0x110 [atlantic]
...
[ 216.127540] ---[ end trace 6467e5964dd2640b ]---
[ 216.132160] DMA-API: Mapped at:
[ 216.132162] debug_dma_alloc_coherent+0x66/0x2f0
[ 216.132165] dma_alloc_attrs+0xf5/0x1b0
[ 216.132168] aq_ring_hwts_rx_alloc+0x150/0x1f0 [atlantic]
[ 216.132193] aq_ptp_ring_alloc+0x1bb/0x540 [atlantic]
[ 216.132213] aq_nic_init+0x4a1/0x760 [atlantic]
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
netdevsim: avoid potential loop in nsim_dev_trap_report_work()
Many syzbot reports include the following trace [1]
If nsim_dev_trap_report_work() can not grab the mutex,
it should rearm itself at least one jiffie later.
[1]
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
Hardware name: Google Google Compute Engine/Google Comp ...
In the Linux kernel, the following vulnerability has been resolved:
netdevsim: avoid potential loop in nsim_dev_trap_report_work()
Many syzbot reports include the following trace [1]
If nsim_dev_trap_report_work() can not grab the mutex,
it should rearm itself at least one jiffie later.
[1]
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
instrument_atomic_read include/linux/instrumented.h:68 [inline]
atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
_raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
debug_object_activate+0x349/0x540 lib/debugobjects.c:726
debug_work_activate kernel/workqueue.c:578 [inline]
insert_work+0x30/0x230 kernel/workqueue.c:1650
__queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
__queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
queue_delayed_work include/linux/workqueue.h:563 [inline]
schedule_delayed_work include/linux/workqueue.h:677 [inline]
nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
wifi: mac80211: improve CSA/ECSA connection refusal
As mentioned in the previous commit, we pretty quickly found
that some APs have ECSA elements stuck in their probe response,
so using that to not attempt to connect while CSA is happening
we never connect to such an AP.
Improve this situation by checking more carefully and ignoring
the ECSA if cfg80211 has previously detected the ECSA element
being stuck in the probe respons ...
In the Linux kernel, the following vulnerability has been resolved:
wifi: mac80211: improve CSA/ECSA connection refusal
As mentioned in the previous commit, we pretty quickly found
that some APs have ECSA elements stuck in their probe response,
so using that to not attempt to connect while CSA is happening
we never connect to such an AP.
Improve this situation by checking more carefully and ignoring
the ECSA if cfg80211 has previously detected the ECSA element
being stuck in the probe response.
Additionally, allow connecting to an AP that's switching to a
channel it's already using, unless it's using quiet mode. In
this case, we may just have to adjust bandwidth later. If it's
actually switching channels, it's better not to try to connect
in the middle of that.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
wifi: cfg80211: detect stuck ECSA element in probe resp
We recently added some validation that we don't try to
connect to an AP that is currently in a channel switch
process, since that might want the channel to be quiet
or we might not be able to connect in time to hear the
switching in a beacon. This was in commit c09c4f31998b
("wifi: mac80211: don't connect to an AP while it's in
a CSA process").
However, we promptly got a ...
In the Linux kernel, the following vulnerability has been resolved:
wifi: cfg80211: detect stuck ECSA element in probe resp
We recently added some validation that we don't try to
connect to an AP that is currently in a channel switch
process, since that might want the channel to be quiet
or we might not be able to connect in time to hear the
switching in a beacon. This was in commit c09c4f31998b
("wifi: mac80211: don't connect to an AP while it's in
a CSA process").
However, we promptly got a report that this caused new
connection failures, and it turns out that the AP that
we now cannot connect to is permanently advertising an
extended channel switch announcement, even with quiet.
The AP in question was an Asus RT-AC53, with firmware
3.0.0.4.380_10760-g21a5898.
As a first step, attempt to detect that we're dealing
with such a situation, so mac80211 can use this later.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
net: stmmac: xgmac: fix handling of DPP safety error for DMA channels
Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
XGMAC core") checks and reports safety errors, but leaves the
Data Path Parity Errors for each channel in DMA unhandled at all, lead to
a storm of interrupt.
Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
|
In the Linux kernel, the following vulnerability has been resolved:
KVM: s390: vsie: fix race during shadow creation
Right now it is possible to see gmap->private being zero in
kvm_s390_vsie_gmap_notifier resulting in a crash. This is due to the
fact that we add gmap->private == kvm after creation:
static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
struct vsie_page *vsie_page)
{
[...]
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ER ...
In the Linux kernel, the following vulnerability has been resolved:
KVM: s390: vsie: fix race during shadow creation
Right now it is possible to see gmap->private being zero in
kvm_s390_vsie_gmap_notifier resulting in a crash. This is due to the
fact that we add gmap->private == kvm after creation:
static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
struct vsie_page *vsie_page)
{
[...]
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ERR(gmap))
return PTR_ERR(gmap);
gmap->private = vcpu->kvm;
Let children inherit the private field of the parent.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
xen/events: close evtchn after mapping cleanup
shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.
This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
channel:
CPU0 ...
In the Linux kernel, the following vulnerability has been resolved:
xen/events: close evtchn after mapping cleanup
shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.
This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
channel:
CPU0 CPU1
shutdown_pirq {
xen_evtchn_close(e)
__startup_pirq {
EVTCHNOP_bind_pirq
-> returns just freed evtchn e
set_evtchn_to_irq(e, irq)
}
xen_irq_info_cleanup() {
set_evtchn_to_irq(e, -1)
}
}
Assume here event channel e refers here to the same event channel
number.
After this race the evtchn_to_irq mapping for e is invalid (-1).
- __startup_pirq races with __unbind_from_irq in a similar way. Because
__startup_pirq doesn't take irq_mapping_update_lock it can grab the
evtchn that __unbind_from_irq is currently freeing and cleaning up. In
this case even though the event channel is allocated, its mapping can
be unset in evtchn_to_irq.
The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.
On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.
------------[ cut here ]------------
blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
kernel BUG at drivers/xen/events/events_base.c:499!
invalid opcode: 0000 [#1] SMP PTI
CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1
Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
Workqueue: nvme-reset-wq nvme_reset_work
RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff
RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? show_trace_log_lvl+0x1c1/0x2d9
? show_trace_log_lvl+0x1c1/0x2d9
? set_affinity_irq+0xdc/0x1c0
? __die_body.cold+0x8/0xd
? die+0x2b/0x50
? do_trap+0x90/0x110
? bind_evtchn_to_cpu+0xdf/0xf0
? do_error_trap+0x65/0x80
? bind_evtchn_to_cpu+0xdf/0xf0
? exc_invalid_op+0x4e/0x70
? bind_evtchn_to_cpu+0xdf/0xf0
? asm_exc_invalid_op+0x12/0x20
? bind_evtchn_to_cpu+0xdf/0x
---truncated---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
smb: Fix regression in writes when non-standard maximum write size negotiated
The conversion to netfs in the 6.3 kernel caused a regression when
maximum write size is set by the server to an unexpected value which is
not a multiple of 4096 (similarly if the user overrides the maximum
write size by setting mount parm "wsize", but sets it to a value that
is not a multiple of 4096). When negotiated write size is not a
multiple o ...
In the Linux kernel, the following vulnerability has been resolved:
smb: Fix regression in writes when non-standard maximum write size negotiated
The conversion to netfs in the 6.3 kernel caused a regression when
maximum write size is set by the server to an unexpected value which is
not a multiple of 4096 (similarly if the user overrides the maximum
write size by setting mount parm "wsize", but sets it to a value that
is not a multiple of 4096). When negotiated write size is not a
multiple of 4096 the netfs code can skip the end of the final
page when doing large sequential writes, causing data corruption.
This section of code is being rewritten/removed due to a large
netfs change, but until that point (ie for the 6.3 kernel until now)
we can not support non-standard maximum write sizes.
Add a warning if a user specifies a wsize on mount that is not
a multiple of 4096 (and round down), also add a change where we
round down the maximum write size if the server negotiates a value
that is not a multiple of 4096 (we also have to check to make sure that
we do not round it down to zero).
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
wifi: iwlwifi: mvm: fix a crash when we run out of stations
A DoS tool that injects loads of authentication frames made our AP
crash. The iwl_mvm_is_dup() function couldn't find the per-queue
dup_data which was not allocated.
The root cause for that is that we ran out of stations in the firmware
and we didn't really add the station to the firmware, yet we didn't
return an error to mac80211.
Mac80211 was thinking that we have ...
In the Linux kernel, the following vulnerability has been resolved:
wifi: iwlwifi: mvm: fix a crash when we run out of stations
A DoS tool that injects loads of authentication frames made our AP
crash. The iwl_mvm_is_dup() function couldn't find the per-queue
dup_data which was not allocated.
The root cause for that is that we ran out of stations in the firmware
and we didn't really add the station to the firmware, yet we didn't
return an error to mac80211.
Mac80211 was thinking that we have the station and because of that,
sta_info::uploaded was set to 1. This allowed
ieee80211_find_sta_by_ifaddr() to return a valid station object, but
that ieee80211_sta didn't have any iwl_mvm_sta object initialized and
that caused the crash mentioned earlier when we got Rx on that station.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix data corruption in dsync block recovery for small block sizes
The helper function nilfs_recovery_copy_block() of
nilfs_recovery_dsync_blocks(), which recovers data from logs created by
data sync writes during a mount after an unclean shutdown, incorrectly
calculates the on-page offset when copying repair data to the file's page
cache. In environments where the block size is smaller than the page
size, this flaw ca ...
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix data corruption in dsync block recovery for small block sizes
The helper function nilfs_recovery_copy_block() of
nilfs_recovery_dsync_blocks(), which recovers data from logs created by
data sync writes during a mount after an unclean shutdown, incorrectly
calculates the on-page offset when copying repair data to the file's page
cache. In environments where the block size is smaller than the page
size, this flaw can cause data corruption and leak uninitialized memory
bytes during the recovery process.
Fix these issues by correcting this byte offset calculation on the page.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.
While migrate_pages_batch() locks a folio and waits for the writeback to
complete, the log writer thread that should bring the writeback to
completion picks up the folio being written back in
nilfs_lookup_dirty_data_buffers() tha ...
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.
While migrate_pages_batch() locks a folio and waits for the writeback to
complete, the log writer thread that should bring the writeback to
completion picks up the folio being written back in
nilfs_lookup_dirty_data_buffers() that it calls for subsequent log
creation and was trying to lock the folio. Thus causing a deadlock.
In the first place, it is unexpected that folios/pages in the middle of
writeback will be updated and become dirty. Nilfs2 adds a checksum to
verify the validity of the log being written and uses it for recovery at
mount, so data changes during writeback are suppressed. Since this is
broken, an unclean shutdown could potentially cause recovery to fail.
Investigation revealed that the root cause is that the wait for writeback
completion in nilfs_page_mkwrite() is conditional, and if the backing
device does not require stable writes, data may be modified without
waiting.
Fix these issues by making nilfs_page_mkwrite() wait for writeback to
finish regardless of the stable write requirement of the backing device.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
In commit ac5047671758 ("hv_netvsc: Disable NAPI before closing the
VMBus channel"), napi_disable was getting called for all channels,
including all subchannels without confirming if they are enabled or not.
This caused hv_netvsc getting hung at napi_disable, when netvsc_probe()
has finished running but nvdev->subchan_work has not started yet.
netvsc_subchan ...
In the Linux kernel, the following vulnerability has been resolved:
hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
In commit ac5047671758 ("hv_netvsc: Disable NAPI before closing the
VMBus channel"), napi_disable was getting called for all channels,
including all subchannels without confirming if they are enabled or not.
This caused hv_netvsc getting hung at napi_disable, when netvsc_probe()
has finished running but nvdev->subchan_work has not started yet.
netvsc_subchan_work() -> rndis_set_subchannel() has not created the
sub-channels and because of that netvsc_sc_open() is not running.
netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which
netvsc_subchan_work did not run.
netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI
cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the
NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the
opposite.
Now during netvsc_device_remove(), when napi_disable is called for those
subchannels, napi_disable gets stuck on infinite msleep.
This fix addresses this problem by ensuring that napi_disable() is not
getting called for non-enabled NAPI struct.
But netif_napi_del() is still necessary for these non-enabled NAPI struct
for cleanup purpose.
Call trace:
[ 654.559417] task:modprobe state:D stack: 0 pid: 2321 ppid: 1091 flags:0x00004002
[ 654.568030] Call Trace:
[ 654.571221] <TASK>
[ 654.573790] __schedule+0x2d6/0x960
[ 654.577733] schedule+0x69/0xf0
[ 654.581214] schedule_timeout+0x87/0x140
[ 654.585463] ? __bpf_trace_tick_stop+0x20/0x20
[ 654.590291] msleep+0x2d/0x40
[ 654.593625] napi_disable+0x2b/0x80
[ 654.597437] netvsc_device_remove+0x8a/0x1f0 [hv_netvsc]
[ 654.603935] rndis_filter_device_remove+0x194/0x1c0 [hv_netvsc]
[ 654.611101] ? do_wait_intr+0xb0/0xb0
[ 654.615753] netvsc_remove+0x7c/0x120 [hv_netvsc]
[ 654.621675] vmbus_remove+0x27/0x40 [hv_vmbus]
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
parisc: BTLB: Fix crash when setting up BTLB at CPU bringup
When using hotplug and bringing up a 32-bit CPU, ask the firmware about the
BTLB information to set up the static (block) TLB entries.
For that write access to the static btlb_info struct is needed, but
since it is marked __ro_after_init the kernel segfaults with missing
write permissions.
Fix the crash by dropping the __ro_after_init annotation.
|
In the Linux kernel, the following vulnerability has been resolved:
parisc: Fix random data corruption from exception handler
The current exception handler implementation, which assists when accessing
user space memory, may exhibit random data corruption if the compiler decides
to use a different register than the specified register %r29 (defined in
ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another
register, the fault handler will nevertheless store -EFAULT into %r29 a ...
In the Linux kernel, the following vulnerability has been resolved:
parisc: Fix random data corruption from exception handler
The current exception handler implementation, which assists when accessing
user space memory, may exhibit random data corruption if the compiler decides
to use a different register than the specified register %r29 (defined in
ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another
register, the fault handler will nevertheless store -EFAULT into %r29 and thus
trash whatever this register is used for.
Looking at the assembly I found that this happens sometimes in emulate_ldd().
To solve the issue, the easiest solution would be if it somehow is
possible to tell the fault handler which register is used to hold the error
code. Using %0 or %1 in the inline assembly is not posssible as it will show
up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not
convert to an integer.
This patch takes another, better and more flexible approach:
We extend the __ex_table (which is out of the execution path) by one 32-word.
In this word we tell the compiler to insert the assembler instruction
"or %r0,%r0,%reg", where %reg references the register which the compiler
choosed for the error return code.
In case of an access failure, the fault handler finds the __ex_table entry and
can examine the opcode. The used register is encoded in the lowest 5 bits, and
the fault handler can then store -EFAULT into this register.
Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT
config option any longer.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame()
Syzkaller reported [1] hitting a warning after failing to allocate
resources for skb in hsr_init_skb(). Since a WARN_ONCE() call will
not help much in this case, it might be prudent to switch to
netdev_warn_once(). At the very least it will suppress syzkaller
reports such as [1].
Just in case, use netdev_warn_once() in send_prp_supervision_frame()
for similar reason ...
In the Linux kernel, the following vulnerability has been resolved:
net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame()
Syzkaller reported [1] hitting a warning after failing to allocate
resources for skb in hsr_init_skb(). Since a WARN_ONCE() call will
not help much in this case, it might be prudent to switch to
netdev_warn_once(). At the very least it will suppress syzkaller
reports such as [1].
Just in case, use netdev_warn_once() in send_prp_supervision_frame()
for similar reasons.
[1]
HSR: Could not send supervision frame
WARNING: CPU: 1 PID: 85 at net/hsr/hsr_device.c:294 send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
RIP: 0010:send_hsr_supervision_frame+0x60a/0x810 net/hsr/hsr_device.c:294
...
Call Trace:
<IRQ>
hsr_announce+0x114/0x370 net/hsr/hsr_device.c:382
call_timer_fn+0x193/0x590 kernel/time/timer.c:1700
expire_timers kernel/time/timer.c:1751 [inline]
__run_timers+0x764/0xb20 kernel/time/timer.c:2022
run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035
__do_softirq+0x21a/0x8de kernel/softirq.c:553
invoke_softirq kernel/softirq.c:427 [inline]
__irq_exit_rcu kernel/softirq.c:632 [inline]
irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
sysvec_apic_timer_interrupt+0x95/0xb0 arch/x86/kernel/apic/apic.c:1076
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649
...
This issue is also found in older kernels (at least up to 5.10).
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
interconnect: qcom: sc8180x: Mark CO0 BCM keepalive
The CO0 BCM needs to be up at all times, otherwise some hardware (like
the UFS controller) loses its connection to the rest of the SoC,
resulting in a hang of the platform, accompanied by a spectacular
logspam.
Mark it as keepalive to prevent such cases.
|
|
In the Linux kernel, the following vulnerability has been resolved:
iommu/vt-d: Fix sysfs leak in alloc_iommu()
iommu_device_sysfs_add() is called before, so is has to be cleaned on subsequent
errors.
|
In the Linux kernel, the following vulnerability has been resolved:
scsi: target: core: Avoid smp_processor_id() in preemptible code
The BUG message "BUG: using smp_processor_id() in preemptible [00000000]
code" was observed for TCMU devices with kernel config DEBUG_PREEMPT.
The message was observed when blktests block/005 was run on TCMU devices
with fileio backend or user:zbc backend [1]. The commit 1130b499b4a7
("scsi: target: tcm_loop: Use LIO wq cmd submission helper") triggered the
symp ...
In the Linux kernel, the following vulnerability has been resolved:
scsi: target: core: Avoid smp_processor_id() in preemptible code
The BUG message "BUG: using smp_processor_id() in preemptible [00000000]
code" was observed for TCMU devices with kernel config DEBUG_PREEMPT.
The message was observed when blktests block/005 was run on TCMU devices
with fileio backend or user:zbc backend [1]. The commit 1130b499b4a7
("scsi: target: tcm_loop: Use LIO wq cmd submission helper") triggered the
symptom. The commit modified work queue to handle commands and changed
'current->nr_cpu_allowed' at smp_processor_id() call.
The message was also observed at system shutdown when TCMU devices were not
cleaned up [2]. The function smp_processor_id() was called in SCSI host
work queue for abort handling, and triggered the BUG message. This symptom
was observed regardless of the commit 1130b499b4a7 ("scsi: target:
tcm_loop: Use LIO wq cmd submission helper").
To avoid the preemptible code check at smp_processor_id(), get CPU ID with
raw_smp_processor_id() instead. The CPU ID is used for performance
improvement then thread move to other CPU will not affect the code.
[1]
[ 56.468103] run blktests block/005 at 2021-05-12 14:16:38
[ 57.369473] check_preemption_disabled: 85 callbacks suppressed
[ 57.369480] BUG: using smp_processor_id() in preemptible [00000000] code: fio/1511
[ 57.369506] BUG: using smp_processor_id() in preemptible [00000000] code: fio/1510
[ 57.369512] BUG: using smp_processor_id() in preemptible [00000000] code: fio/1506
[ 57.369552] caller is __target_init_cmd+0x157/0x170 [target_core_mod]
[ 57.369606] CPU: 4 PID: 1506 Comm: fio Not tainted 5.13.0-rc1+ #34
[ 57.369613] Hardware name: System manufacturer System Product Name/PRIME Z270-A, BIOS 1302 03/15/2018
[ 57.369617] Call Trace:
[ 57.369621] BUG: using smp_processor_id() in preemptible [00000000] code: fio/1507
[ 57.369628] dump_stack+0x6d/0x89
[ 57.369642] check_preemption_disabled+0xc8/0xd0
[ 57.369628] caller is __target_init_cmd+0x157/0x170 [target_core_mod]
[ 57.369655] __target_init_cmd+0x157/0x170 [target_core_mod]
[ 57.369695] target_init_cmd+0x76/0x90 [target_core_mod]
[ 57.369732] tcm_loop_queuecommand+0x109/0x210 [tcm_loop]
[ 57.369744] scsi_queue_rq+0x38e/0xc40
[ 57.369761] __blk_mq_try_issue_directly+0x109/0x1c0
[ 57.369779] blk_mq_try_issue_directly+0x43/0x90
[ 57.369790] blk_mq_submit_bio+0x4e5/0x5d0
[ 57.369812] submit_bio_noacct+0x46e/0x4e0
[ 57.369830] __blkdev_direct_IO_simple+0x1a3/0x2d0
[ 57.369859] ? set_init_blocksize.isra.0+0x60/0x60
[ 57.369880] generic_file_read_iter+0x89/0x160
[ 57.369898] blkdev_read_iter+0x44/0x60
[ 57.369906] new_sync_read+0x102/0x170
[ 57.369929] vfs_read+0xd4/0x160
[ 57.369941] __x64_sys_pread64+0x6e/0xa0
[ 57.369946] ? lockdep_hardirqs_on+0x79/0x100
[ 57.369958] do_syscall_64+0x3a/0x70
[ 57.369965] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 57.369973] RIP: 0033:0x7f7ed4c1399f
[ 57.369979] Code: 08 89 3c 24 48 89 4c 24 18 e8 7d f3 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 cd f3 ff ff 48 8b
[ 57.369983] RSP: 002b:00007ffd7918c580 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 57.369990] RAX: ffffffffffffffda RBX: 00000000015b4540 RCX: 00007f7ed4c1399f
[ 57.369993] RDX: 0000000000001000 RSI: 00000000015de000 RDI: 0000000000000009
[ 57.369996] RBP: 00000000015b4540 R08: 0000000000000000 R09: 0000000000000001
[ 57.369999] R10: 0000000000e5c000 R11: 0000000000000293 R12: 00007f7eb5269a70
[ 57.370002] R13: 0000000000000000 R14: 0000000000001000 R15: 00000000015b4568
[ 57.370031] CPU: 7 PID: 1507 Comm: fio Not tainted 5.13.0-rc1+ #34
[ 57.370036] Hardware name: System manufacturer System Product Name/PRIME Z270-A, BIOS 1302 03/15/2018
[ 57.370039] Call Trace:
[ 57.370045] dump_stack+0x6d/0x89
[ 57.370056] ch
---truncated---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
tracing: Ensure visibility when inserting an element into tracing_map
Running the following two commands in parallel on a multi-processor
AArch64 machine can sporadically produce an unexpected warning about
duplicate histogram entries:
$ while true; do
echo hist:key=id.syscall:val=hitcount > \
/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
cat /sys/kernel/debug/tracing/events/raw_syscalls/sy ...
In the Linux kernel, the following vulnerability has been resolved:
tracing: Ensure visibility when inserting an element into tracing_map
Running the following two commands in parallel on a multi-processor
AArch64 machine can sporadically produce an unexpected warning about
duplicate histogram entries:
$ while true; do
echo hist:key=id.syscall:val=hitcount > \
/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
sleep 0.001
done
$ stress-ng --sysbadaddr $(nproc)
The warning looks as follows:
[ 2911.172474] ------------[ cut here ]------------
[ 2911.173111] Duplicates detected: 1
[ 2911.173574] WARNING: CPU: 2 PID: 12247 at kernel/trace/tracing_map.c:983 tracing_map_sort_entries+0x3e0/0x408
[ 2911.174702] Modules linked in: iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) af_packet(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ena(E) tiny_power_button(E) qemu_fw_cfg(E) button(E) fuse(E) efi_pstore(E) ip_tables(E) x_tables(E) xfs(E) libcrc32c(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) polyval_ce(E) polyval_generic(E) ghash_ce(E) gf128mul(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) sm3(E) sha3_ce(E) sha512_ce(E) sha512_arm64(E) sha2_ce(E) sha256_arm64(E) nvme(E) sha1_ce(E) nvme_core(E) nvme_auth(E) t10_pi(E) sg(E) scsi_mod(E) scsi_common(E) efivarfs(E)
[ 2911.174738] Unloaded tainted modules: cppc_cpufreq(E):1
[ 2911.180985] CPU: 2 PID: 12247 Comm: cat Kdump: loaded Tainted: G E 6.7.0-default #2 1b58bbb22c97e4399dc09f92d309344f69c44a01
[ 2911.182398] Hardware name: Amazon EC2 c7g.8xlarge/, BIOS 1.0 11/1/2018
[ 2911.183208] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 2911.184038] pc : tracing_map_sort_entries+0x3e0/0x408
[ 2911.184667] lr : tracing_map_sort_entries+0x3e0/0x408
[ 2911.185310] sp : ffff8000a1513900
[ 2911.185750] x29: ffff8000a1513900 x28: ffff0003f272fe80 x27: 0000000000000001
[ 2911.186600] x26: ffff0003f272fe80 x25: 0000000000000030 x24: 0000000000000008
[ 2911.187458] x23: ffff0003c5788000 x22: ffff0003c16710c8 x21: ffff80008017f180
[ 2911.188310] x20: ffff80008017f000 x19: ffff80008017f180 x18: ffffffffffffffff
[ 2911.189160] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000a15134b8
[ 2911.190015] x14: 0000000000000000 x13: 205d373432323154 x12: 5b5d313131333731
[ 2911.190844] x11: 00000000fffeffff x10: 00000000fffeffff x9 : ffffd1b78274a13c
[ 2911.191716] x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 000000000057ffa8
[ 2911.192554] x5 : ffff0012f6c24ec0 x4 : 0000000000000000 x3 : ffff2e5b72b5d000
[ 2911.193404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0003ff254480
[ 2911.194259] Call trace:
[ 2911.194626] tracing_map_sort_entries+0x3e0/0x408
[ 2911.195220] hist_show+0x124/0x800
[ 2911.195692] seq_read_iter+0x1d4/0x4e8
[ 2911.196193] seq_read+0xe8/0x138
[ 2911.196638] vfs_read+0xc8/0x300
[ 2911.197078] ksys_read+0x70/0x108
[ 2911.197534] __arm64_sys_read+0x24/0x38
[ 2911.198046] invoke_syscall+0x78/0x108
[ 2911.198553] el0_svc_common.constprop.0+0xd0/0xf8
[ 2911.199157] do_el0_svc+0x28/0x40
[ 2911.199613] el0_svc+0x40/0x178
[ 2911.200048] el0t_64_sync_handler+0x13c/0x158
[ 2911.200621] el0t_64_sync+0x1a8/0x1b0
[ 2911.201115] ---[ end trace 0000000000000000 ]---
The problem appears to be caused by CPU reordering of writes issued from
__tracing_map_insert().
The check for the presence of an element with a given key in this
function is:
val = READ_ONCE(entry->val);
if (val && keys_match(key, val->key, map->key_size)) ...
The write of a new entry is:
elt = get_free_elt(map);
memcpy(elt->key, key, map->key_size);
entry->val = elt;
The "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;"
stores may become visible in the reversed order on another CPU. This
second CPU might then incorrectly determine that a new key doesn't match
an already present val->key and subse
---truncated---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
ext4: avoid online resizing failures due to oversized flex bg
When we online resize an ext4 filesystem with a oversized flexbg_size,
mkfs.ext4 -F -G 67108864 $dev -b 4096 100M
mount $dev $dir
resize2fs $dev 16G
the following WARN_ON is triggered:
==================================================================
WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550
Modules linked in: sg(E) ...
In the Linux kernel, the following vulnerability has been resolved:
ext4: avoid online resizing failures due to oversized flex bg
When we online resize an ext4 filesystem with a oversized flexbg_size,
mkfs.ext4 -F -G 67108864 $dev -b 4096 100M
mount $dev $dir
resize2fs $dev 16G
the following WARN_ON is triggered:
==================================================================
WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550
Modules linked in: sg(E)
CPU: 0 PID: 427 Comm: resize2fs Tainted: G E 6.6.0-rc5+ #314
RIP: 0010:__alloc_pages+0x411/0x550
Call Trace:
<TASK>
__kmalloc_large_node+0xa2/0x200
__kmalloc+0x16e/0x290
ext4_resize_fs+0x481/0xd80
__ext4_ioctl+0x1616/0x1d90
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xf0/0x150
do_syscall_64+0x3b/0x90
==================================================================
This is because flexbg_size is too large and the size of the new_group_data
array to be allocated exceeds MAX_ORDER. Currently, the minimum value of
MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding
maximum number of groups that can be allocated is:
(PAGE_SIZE << MAX_ORDER) / sizeof(struct ext4_new_group_data) ≈ 21845
And the value that is down-aligned to the power of 2 is 16384. Therefore,
this value is defined as MAX_RESIZE_BG, and the number of groups added
each time does not exceed this value during resizing, and is added multiple
times to complete the online resizing. The difference is that the metadata
in a flex_bg may be more dispersed.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Wake DMCUB before executing GPINT commands
[Why]
DMCUB can be in idle when we attempt to interface with the HW through
the GPINT mailbox resulting in a system hang.
[How]
Add dc_wake_and_execute_gpint() to wrap the wake, execute, sleep
sequence.
If the GPINT executes successfully then DMCUB will be put back into
sleep after the optional response is returned.
It functions similar to the inbox command interfa ...
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Wake DMCUB before executing GPINT commands
[Why]
DMCUB can be in idle when we attempt to interface with the HW through
the GPINT mailbox resulting in a system hang.
[How]
Add dc_wake_and_execute_gpint() to wrap the wake, execute, sleep
sequence.
If the GPINT executes successfully then DMCUB will be put back into
sleep after the optional response is returned.
It functions similar to the inbox command interface.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Refactor DMCUB enter/exit idle interface
[Why]
We can hang in place trying to send commands when the DMCUB isn't
powered on.
[How]
We need to exit out of the idle state prior to sending a command,
but the process that performs the exit also invokes a command itself.
Fixing this issue involves the following:
1. Using a software state to track whether or not we need to start
the process to exit idle or not ...
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Refactor DMCUB enter/exit idle interface
[Why]
We can hang in place trying to send commands when the DMCUB isn't
powered on.
[How]
We need to exit out of the idle state prior to sending a command,
but the process that performs the exit also invokes a command itself.
Fixing this issue involves the following:
1. Using a software state to track whether or not we need to start
the process to exit idle or notify idle.
It's possible for the hardware to have exited an idle state without
driver knowledge, but entering one is always restricted to a driver
allow - which makes the SW state vs HW state mismatch issue purely one
of optimization, which should seldomly be hit, if at all.
2. Refactor any instances of exit/notify idle to use a single wrapper
that maintains this SW state.
This works simialr to dc_allow_idle_optimizations, but works at the
DMCUB level and makes sure the state is marked prior to any notify/exit
idle so we don't enter an infinite loop.
3. Make sure we exit out of idle prior to sending any commands or
waiting for DMCUB idle.
This patch takes care of 1/2. A future patch will take care of wrapping
DMCUB command submission with calls to this new interface.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context
Indirection (*) is of lower precedence than postfix increment (++). Logic
in napi_poll context would cause an out-of-bound read by first increment
the pointer address by byte address space and then dereference the value.
Rather, the intended logic was to dereference first and then increment the
underlying value.
|
In the Linux kernel, the following vulnerability has been resolved:
thermal: intel: hfi: Add syscore callbacks for system-wide PM
The kernel allocates a memory buffer and provides its location to the
hardware, which uses it to update the HFI table. This allocation occurs
during boot and remains constant throughout runtime.
When resuming from hibernation, the restore kernel allocates a second
memory buffer and reprograms the HFI hardware with the new location as
part of a normal boot. The loca ...
In the Linux kernel, the following vulnerability has been resolved:
thermal: intel: hfi: Add syscore callbacks for system-wide PM
The kernel allocates a memory buffer and provides its location to the
hardware, which uses it to update the HFI table. This allocation occurs
during boot and remains constant throughout runtime.
When resuming from hibernation, the restore kernel allocates a second
memory buffer and reprograms the HFI hardware with the new location as
part of a normal boot. The location of the second memory buffer may
differ from the one allocated by the image kernel.
When the restore kernel transfers control to the image kernel, its HFI
buffer becomes invalid, potentially leading to memory corruption if the
hardware writes to it (the hardware continues to use the buffer from the
restore kernel).
It is also possible that the hardware "forgets" the address of the memory
buffer when resuming from "deep" suspend. Memory corruption may also occur
in such a scenario.
To prevent the described memory corruption, disable HFI when preparing to
suspend or hibernate. Enable it when resuming.
Add syscore callbacks to handle the package of the boot CPU (packages of
non-boot CPUs are handled via CPU offline). Syscore ops always run on the
boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend
and hibernation. Syscore ops only run in these cases.
[ rjw: Comment adjustment, subject and changelog edits ]
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: Fix lock dependency warning with srcu
======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-yangp #2289 Not tainted
------------------------------------------------------
kworker/0:2/996 is trying to acquire lock:
(srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0
but task is already holding lock:
((work_completion)(&svms->deferred ...
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: Fix lock dependency warning with srcu
======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-yangp #2289 Not tainted
------------------------------------------------------
kworker/0:2/996 is trying to acquire lock:
(srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x5/0x1a0
but task is already holding lock:
((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}, at:
process_one_work+0x211/0x560
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 ((work_completion)(&svms->deferred_list_work)){+.+.}-{0:0}:
__flush_work+0x88/0x4f0
svm_range_list_lock_and_flush_work+0x3d/0x110 [amdgpu]
svm_range_set_attr+0xd6/0x14c0 [amdgpu]
kfd_ioctl+0x1d1/0x630 [amdgpu]
__x64_sys_ioctl+0x88/0xc0
-> #2 (&info->lock#2){+.+.}-{3:3}:
__mutex_lock+0x99/0xc70
amdgpu_amdkfd_gpuvm_restore_process_bos+0x54/0x740 [amdgpu]
restore_process_helper+0x22/0x80 [amdgpu]
restore_process_worker+0x2d/0xa0 [amdgpu]
process_one_work+0x29b/0x560
worker_thread+0x3d/0x3d0
-> #1 ((work_completion)(&(&process->restore_work)->work)){+.+.}-{0:0}:
__flush_work+0x88/0x4f0
__cancel_work_timer+0x12c/0x1c0
kfd_process_notifier_release_internal+0x37/0x1f0 [amdgpu]
__mmu_notifier_release+0xad/0x240
exit_mmap+0x6a/0x3a0
mmput+0x6a/0x120
do_exit+0x322/0xb90
do_group_exit+0x37/0xa0
__x64_sys_exit_group+0x18/0x20
do_syscall_64+0x38/0x80
-> #0 (srcu){.+.+}-{0:0}:
__lock_acquire+0x1521/0x2510
lock_sync+0x5f/0x90
__synchronize_srcu+0x4f/0x1a0
__mmu_notifier_release+0x128/0x240
exit_mmap+0x6a/0x3a0
mmput+0x6a/0x120
svm_range_deferred_list_work+0x19f/0x350 [amdgpu]
process_one_work+0x29b/0x560
worker_thread+0x3d/0x3d0
other info that might help us debug this:
Chain exists of:
srcu --> &info->lock#2 --> (work_completion)(&svms->deferred_list_work)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&svms->deferred_list_work));
lock(&info->lock#2);
lock((work_completion)(&svms->deferred_list_work));
sync(srcu);
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
um: time-travel: fix time corruption
In 'basic' time-travel mode (without =inf-cpu or =ext), we
still get timer interrupts. These can happen at arbitrary
points in time, i.e. while in timer_read(), which pushes
time forward just a little bit. Then, if we happen to get
the interrupt after calculating the new time to push to,
but before actually finishing that, the interrupt will set
the time to a value that's incompatible with ...
In the Linux kernel, the following vulnerability has been resolved:
um: time-travel: fix time corruption
In 'basic' time-travel mode (without =inf-cpu or =ext), we
still get timer interrupts. These can happen at arbitrary
points in time, i.e. while in timer_read(), which pushes
time forward just a little bit. Then, if we happen to get
the interrupt after calculating the new time to push to,
but before actually finishing that, the interrupt will set
the time to a value that's incompatible with the forward,
and we'll crash because time goes backwards when we do the
forwarding.
Fix this by reading the time_travel_time, calculating the
adjustment, and doing the adjustment all with interrupts
disabled.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Fix disable_otg_wa logic
[Why]
When switching to another HDMI mode, we are unnecesarilly
disabling/enabling FIFO causing both HPO and DIG registers to be set at
the same time when only HPO is supposed to be set.
This can lead to a system hang the next time we change refresh rates as
there are cases when we don't disable OTG/FIFO but FIFO is enabled when
it isn't supposed to be.
[How]
Removing the enable/disa ...
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Fix disable_otg_wa logic
[Why]
When switching to another HDMI mode, we are unnecesarilly
disabling/enabling FIFO causing both HPO and DIG registers to be set at
the same time when only HPO is supposed to be set.
This can lead to a system hang the next time we change refresh rates as
there are cases when we don't disable OTG/FIFO but FIFO is enabled when
it isn't supposed to be.
[How]
Removing the enable/disable FIFO entirely.
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
PM / devfreq: Synchronize devfreq_monitor_[start/stop]
There is a chance if a frequent switch of the governor
done in a loop result in timer list corruption where
timer cancel being done from two place one from
cancel_delayed_work_sync() and followed by expire_timers()
can be seen from the traces[1].
while true
do
echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
echo "performance" > /sys/clas ...
In the Linux kernel, the following vulnerability has been resolved:
PM / devfreq: Synchronize devfreq_monitor_[start/stop]
There is a chance if a frequent switch of the governor
done in a loop result in timer list corruption where
timer cancel being done from two place one from
cancel_delayed_work_sync() and followed by expire_timers()
can be seen from the traces[1].
while true
do
echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor
done
It looks to be issue with devfreq driver where
device_monitor_[start/stop] need to synchronized so that
delayed work should get corrupted while it is either
being queued or running or being cancelled.
Let's use polling flag and devfreq lock to synchronize the
queueing the timer instance twice and work data being
corrupted.
[1]
...
..
<idle>-0 [003] 9436.209662: timer_cancel timer=0xffffff80444f0428
<idle>-0 [003] 9436.209664: timer_expire_entry timer=0xffffff80444f0428 now=0x10022da1c function=__typeid__ZTSFvP10timer_listE_global_addr baseclk=0x10022da1c
<idle>-0 [003] 9436.209718: timer_expire_exit timer=0xffffff80444f0428
kworker/u16:6-14217 [003] 9436.209863: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2b now=0x10022da1c flags=182452227
vendor.xxxyyy.ha-1593 [004] 9436.209888: timer_cancel timer=0xffffff80444f0428
vendor.xxxyyy.ha-1593 [004] 9436.216390: timer_init timer=0xffffff80444f0428
vendor.xxxyyy.ha-1593 [004] 9436.216392: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2c now=0x10022da1d flags=186646532
vendor.xxxyyy.ha-1593 [005] 9436.220992: timer_cancel timer=0xffffff80444f0428
xxxyyyTraceManag-7795 [004] 9436.261641: timer_cancel timer=0xffffff80444f0428
[2]
9436.261653][ C4] Unable to handle kernel paging request at virtual address dead00000000012a
[ 9436.261664][ C4] Mem abort info:
[ 9436.261666][ C4] ESR = 0x96000044
[ 9436.261669][ C4] EC = 0x25: DABT (current EL), IL = 32 bits
[ 9436.261671][ C4] SET = 0, FnV = 0
[ 9436.261673][ C4] EA = 0, S1PTW = 0
[ 9436.261675][ C4] Data abort info:
[ 9436.261677][ C4] ISV = 0, ISS = 0x00000044
[ 9436.261680][ C4] CM = 0, WnR = 1
[ 9436.261682][ C4] [dead00000000012a] address between user and kernel address ranges
[ 9436.261685][ C4] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 9436.261701][ C4] Skip md ftrace buffer dump for: 0x3a982d0
...
[ 9436.262138][ C4] CPU: 4 PID: 7795 Comm: TraceManag Tainted: G S W O 5.10.149-android12-9-o-g17f915d29d0c #1
[ 9436.262141][ C4] Hardware name: Qualcomm Technologies, Inc. (DT)
[ 9436.262144][ C4] pstate: 22400085 (nzCv daIf +PAN -UAO +TCO BTYPE=--)
[ 9436.262161][ C4] pc : expire_timers+0x9c/0x438
[ 9436.262164][ C4] lr : expire_timers+0x2a4/0x438
[ 9436.262168][ C4] sp : ffffffc010023dd0
[ 9436.262171][ C4] x29: ffffffc010023df0 x28: ffffffd0636fdc18
[ 9436.262178][ C4] x27: ffffffd063569dd0 x26: ffffffd063536008
[ 9436.262182][ C4] x25: 0000000000000001 x24: ffffff88f7c69280
[ 9436.262185][ C4] x23: 00000000000000e0 x22: dead000000000122
[ 9436.262188][ C4] x21: 000000010022da29 x20: ffffff8af72b4e80
[ 9436.262191][ C4] x19: ffffffc010023e50 x18: ffffffc010025038
[ 9436.262195][ C4] x17: 0000000000000240 x16: 0000000000000201
[ 9436.262199][ C4] x15: ffffffffffffffff x14: ffffff889f3c3100
[ 9436.262203][ C4] x13: ffffff889f3c3100 x12: 00000000049f56b8
[ 9436.262207][ C4] x11: 00000000049f56b8 x10: 00000000ffffffff
[ 9436.262212][ C4] x9 : ffffffc010023e50 x8 : dead000000000122
[ 9436.262216][ C4] x7 : ffffffffffffffff x6 : ffffffc0100239d8
[ 9436.262220][ C4] x5 : 0000000000000000 x4 : 0000000000000101
[ 9436.262223][ C4] x3 : 0000000000000080 x2 : ffffff8
---truncated---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
libceph: just wait for more data to be available on the socket
A short read may occur while reading the message footer from the
socket. Later, when the socket is ready for another read, the
messenger invokes all read_partial_*() handlers, including
read_partial_sparse_msg_data(). The expectation is that
read_partial_sparse_msg_data() would bail, allowing the messenger to
invoke read_partial() for the footer and pick up where ...
In the Linux kernel, the following vulnerability has been resolved:
libceph: just wait for more data to be available on the socket
A short read may occur while reading the message footer from the
socket. Later, when the socket is ready for another read, the
messenger invokes all read_partial_*() handlers, including
read_partial_sparse_msg_data(). The expectation is that
read_partial_sparse_msg_data() would bail, allowing the messenger to
invoke read_partial() for the footer and pick up where it left off.
However read_partial_sparse_msg_data() violates that and ends up
calling into the state machine in the OSD client. The sparse-read
state machine assumes that it's a new op and interprets some piece of
the footer as the sparse-read header and returns bogus extents/data
length, etc.
To determine whether read_partial_sparse_msg_data() should bail, let's
reuse cursor->total_resid. Because once it reaches to zero that means
all the extents and data have been successfully received in last read,
else it could break out when partially reading any of the extents and
data. And then osd_sparse_read() could continue where it left off.
[ idryomov: changelog ]
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
xhci: handle isoc Babble and Buffer Overrun events properly
xHCI 4.9 explicitly forbids assuming that the xHC has released its
ownership of a multi-TRB TD when it reports an error on one of the
early TRBs. Yet the driver makes such assumption and releases the TD,
allowing the remaining TRBs to be freed or overwritten by new TDs.
The xHC should also report completion of the final TRB due to its IOC
flag being set by us, regard ...
In the Linux kernel, the following vulnerability has been resolved:
xhci: handle isoc Babble and Buffer Overrun events properly
xHCI 4.9 explicitly forbids assuming that the xHC has released its
ownership of a multi-TRB TD when it reports an error on one of the
early TRBs. Yet the driver makes such assumption and releases the TD,
allowing the remaining TRBs to be freed or overwritten by new TDs.
The xHC should also report completion of the final TRB due to its IOC
flag being set by us, regardless of prior errors. This event cannot
be recognized if the TD has already been freed earlier, resulting in
"Transfer event TRB DMA ptr not part of current TD" error message.
Fix this by reusing the logic for processing isoc Transaction Errors.
This also handles hosts which fail to report the final completion.
Fix transfer length reporting on Babble errors. They may be caused by
device malfunction, no guarantee that the buffer has been filled.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
hwmon: (coretemp) Fix out-of-bounds memory access
Fix a bug that pdata->cpu_map[] is set before out-of-bounds check.
The problem might be triggered on systems with more than 128 cores per
package.
|
In the Linux kernel, the following vulnerability has been resolved:
drm/msm/dpu: check for valid hw_pp in dpu_encoder_helper_phys_cleanup
The commit 8b45a26f2ba9 ("drm/msm/dpu: reserve cdm blocks for writeback
in case of YUV output") introduced a smatch warning about another
conditional block in dpu_encoder_helper_phys_cleanup() which had assumed
hw_pp will always be valid which may not necessarily be true.
Lets fix the other conditional block by making sure hw_pp is valid
before dereferencin ...
In the Linux kernel, the following vulnerability has been resolved:
drm/msm/dpu: check for valid hw_pp in dpu_encoder_helper_phys_cleanup
The commit 8b45a26f2ba9 ("drm/msm/dpu: reserve cdm blocks for writeback
in case of YUV output") introduced a smatch warning about another
conditional block in dpu_encoder_helper_phys_cleanup() which had assumed
hw_pp will always be valid which may not necessarily be true.
Lets fix the other conditional block by making sure hw_pp is valid
before dereferencing it.
Patchwork: https://patchwork.freedesktop.org/patch/574878/
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_limit: reject configurations that cause integer overflow
Reject bogus configs where internal token counter wraps around.
This only occurs with very very large requests, such as 17gbyte/s.
Its better to reject this rather than having incorrect ratelimit.
|
In the Linux kernel, the following vulnerability has been resolved:
net/sched: flower: Fix chain template offload
When a qdisc is deleted from a net device the stack instructs the
underlying driver to remove its flow offload callback from the
associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
then continues to replay the removal of the filters in the block for
this driver by iterating over the chains in the block and invoking the
'reoffload' operation of the classifier be ...
In the Linux kernel, the following vulnerability has been resolved:
net/sched: flower: Fix chain template offload
When a qdisc is deleted from a net device the stack instructs the
underlying driver to remove its flow offload callback from the
associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
then continues to replay the removal of the filters in the block for
this driver by iterating over the chains in the block and invoking the
'reoffload' operation of the classifier being used. In turn, the
classifier in its 'reoffload' operation prepares and emits a
'FLOW_CLS_DESTROY' command for each filter.
However, the stack does not do the same for chain templates and the
underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when
a qdisc is deleted. This results in a memory leak [1] which can be
reproduced using [2].
Fix by introducing a 'tmplt_reoffload' operation and have the stack
invoke it with the appropriate arguments as part of the replay.
Implement the operation in the sole classifier that supports chain
templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}'
command based on whether a flow offload callback is being bound to a
filter block or being unbound from one.
As far as I can tell, the issue happens since cited commit which
reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains()
in __tcf_block_put(). The order cannot be reversed as the filter block
is expected to be freed after flushing all the chains.
[1]
unreferenced object 0xffff888107e28800 (size 2048):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[......
01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab374e>] __kmalloc+0x4e/0x90
[<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0
[<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0
[<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0
unreferenced object 0xffff88816d2c0400 (size 1024):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8.....
10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m....
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90
[<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0
[<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460
[<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0
[<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[2]
# tc qdisc add dev swp1 clsact
# tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32
# tc qdisc del dev
---truncated---
Show More
|
In the Linux kernel, the following vulnerability has been resolved:
blk-mq: fix IO hang from sbitmap wakeup race
In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered
with the following blk_mq_get_driver_tag() in case of getting driver
tag failure.
Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe
the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime
blk_mq_mark_tag_wait() can't get driver tag successfully.
This issue can be reproduced by runn ...
In the Linux kernel, the following vulnerability has been resolved:
blk-mq: fix IO hang from sbitmap wakeup race
In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered
with the following blk_mq_get_driver_tag() in case of getting driver
tag failure.
Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe
the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime
blk_mq_mark_tag_wait() can't get driver tag successfully.
This issue can be reproduced by running the following test in loop, and
fio hang can be observed in < 30min when running it on my test VM
in laptop.
modprobe -r scsi_debug
modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4
dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \
--runtime=100 --numjobs=40 --time_based --name=test \
--ioengine=libaio
Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which
is just fine in case of running out of tag.
Show More
|
|
In the Linux kernel, the following vulnerability has been resolved:
NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
The value of mirror->pg_bytes_written should only be updated after a
successful attempt to flush out the requests on the list.
|
|
In the Linux kernel, the following vulnerability has been resolved:
NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
Ensure that nfs_pageio_error_cleanup() resets the mirror array contents,
so that the structure reflects the fact that it is now empty.
Also change the test in nfs_pageio_do_add_request() to be more robust by
checking whether or not the list is empty rather than relying on the
value of pg_count.
|
|
In the Linux kernel, the following vulnerability has been resolved:
NFS: fix an incorrect limit in filelayout_decode_layout()
The "sizeof(struct nfs_fh)" is two bytes too large and could lead to
memory corruption. It should be NFS_MAXFHSIZE because that's the size
of the ->data[] buffer.
I reversed the size of the arguments to put the variable on the left.
|