20 Jan, 2021
3 commits
-
[ Upstream commit de7f1d9e99d8b99e4e494ad8fcd91f0c4c5c9357 ]
io_uring fds marked O_CLOEXEC and we explicitly cancel all requests
before going through exec, so we don't want to leave task's file
references to not our anymore io_uring instances.Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit d434ab6db524ab1efd0afad4ffa1ee65ca6ac097 ]
__io_req_task_submit() run by task_work can set mm and files, but
io_sq_thread() in some cases, and because __io_sq_thread_acquire_mm()
and __io_sq_thread_acquire_files() do a simple current->mm/files check
it may end up submitting IO with mm/files of another task.We also need to drop it after in the end to drop potentially grabbed
references to them.Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit 621fadc22365f3cf307bcd9048e3372e9ee9cdcc ]
In rare cases a task may be exiting while io_ring_exit_work() trying to
cancel/wait its requests. It's ok for __io_sq_thread_acquire_mm()
because of SQPOLL check, but is not for __io_sq_thread_acquire_files().
Play safe and fail for both of them.Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
17 Jan, 2021
4 commits
-
[ Upstream commit 3e2224c5867fead6c0b94b84727cc676ac6353a3 ]
alloc_fixed_file_ref_node() currently returns an ERR_PTR on failure.
io_sqe_files_unregister() expects it to return NULL and since it can only
return -ENOMEM, it makes more sense to change alloc_fixed_file_ref_node()
to behave that way.Fixes: 1ffc54220c44 ("io_uring: fix io_sqe_files_unregister() hangs")
Reported-by: Dan Carpenter
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
commit 6c503150ae33ee19036255cfda0998463613352c upstream
IOPOLL skips completion locking but keeps it under uring_lock, thus
io_cqring_overflow_flush() and so io_cqring_events() need additional
locking with uring_lock in some cases for IOPOLL.Remove __io_cqring_overflow_flush() from io_cqring_events(), introduce a
wrapper around flush doing needed synchronisation and call it by hand.Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
commit 89448c47b8452b67c146dc6cad6f737e004c5caf upstream
We don't need to take uring_lock for SQPOLL|IOPOLL to do
io_cqring_overflow_flush() when cq_overflow_list is empty, remove it
from the hot path.Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
commit 81b6d05ccad4f3d8a9dfb091fb46ad6978ee40e4 upstream
io_req_task_submit() might be called for IOPOLL, do the fail path under
uring_lock to comply with IOPOLL synchronisation based solely on it.Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
06 Jan, 2021
7 commits
-
[ Upstream commit 9cd2be519d05ee78876d55e8e902b7125f78b74f ]
list_empty_careful() is not racy only if some conditions are met, i.e.
no re-adds after del_init. io_cqring_overflow_flush() does list_move(),
so it's actually racy.Remove those checks, we have ->cq_check_overflow for the fast path.
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
commit 65b2b213484acd89a3c20dbb524e52a2f3793b78 upstream.
syzbot reports following issue:
INFO: task syz-executor.2:12399 can't die for more than 143 seconds.
task:syz-executor.2 state:D stack:28744 pid:12399 ppid: 8504 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:3773 [inline]
__schedule+0x893/0x2170 kernel/sched/core.c:4522
schedule+0xcf/0x270 kernel/sched/core.c:4600
schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
kthread_stop+0x17a/0x720 kernel/kthread.c:596
io_put_sq_data fs/io_uring.c:7193 [inline]
io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
io_finish_async fs/io_uring.c:7297 [inline]
io_sq_offload_create fs/io_uring.c:8015 [inline]
io_uring_create fs/io_uring.c:9433 [inline]
io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45deb9
Code: Unable to access opcode bytes at RIP 0x45de8f.
RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
INFO: task syz-executor.2:12399 blocked for more than 143 seconds.
Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Currently we don't have a reproducer yet, but seems that there is a
race in current codes:
=> io_put_sq_data
ctx_list is empty now. |
==> kthread_park(sqd->thread); |
| T1: sq thread is parked now.
==> kthread_stop(sqd->thread); |
KTHREAD_SHOULD_STOP is set now.|
===> kthread_unpark(k); |
| T2: sq thread is now unparkd, run again.
|
| T3: sq thread is now preempted out.
|
===> wake_up_process(k); |
|
| T4: Since sqd ctx_list is empty, needs_sched will be true,
| then sq thread sets task state to TASK_INTERRUPTIBLE,
| and schedule, now sq thread will never be waken up.
===> wait_for_completion |I have artificially used mdelay() to simulate above race, will get same
stack like this syzbot report, but to be honest, I'm not sure this code
race triggers syzbot report.To fix this possible code race, when sq thread is unparked, need to check
whether sq thread has been stopped.Reported-by: syzbot+03beeb595f074db9cfd1@syzkaller.appspotmail.com
Signed-off-by: Xiaoguang Wang
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 1ffc54220c444774b7f09e6d2121e732f8e19b94 upstream.
io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 1642b4450d20e31439c80c28256c8eee08684698 upstream.
Setting a new reference node to a file data is not trivial, don't repeat
it, add and use a helper.Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit ac0648a56c1ff66c1cbf735075ad33a26cbc50de upstream.
io_file_data_ref_zero() can be invoked from soft-irq from the RCU core,
hence we need to ensure that the file_data lock is bottom half safe. Use
the _bh() variants when grabbing this lock.Reported-by: syzbot+1f4ba1e5520762c523c6@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 77788775c7132a8d93c6930ab1bd84fc743c7cb7 upstream.
If we COW the identity, we assume that ->mm never changes. But this
isn't true of multiple processes end up sharing the ring. Hence treat
id->mm like like any other process compontent when it comes to the
identity mapping. This is pretty trivial, just moving the existing grab
into io_grab_identity(), and including a check for the match.Cc: stable@vger.kernel.org # 5.10
Fixes: 1e6fa5216a0e ("io_uring: COW io_identity on mismatch")
Reported-by: Christian Brauner :
Tested-by: Christian Brauner :
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit dfea9fce29fda6f2f91161677e0e0d9b671bc099 upstream.
The purpose of io_uring_cancel_files() is to wait for all requests
matching ->files to go/be cancelled. We should first drop files of a
request in io_req_drop_files() and only then make it undiscoverable for
io_uring_cancel_files.First drop, then delete from list. It's ok to leave req->id->files
dangling, because it's not dereferenced by cancellation code, only
compared against. It would potentially go to sleep and be awaken by
following in io_req_drop_files() wake_up().Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
Cc: # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman
30 Dec, 2020
11 commits
-
commit 00c18640c2430c4bafaaeede1f9dd6f7ec0e4b25 upstream.
Before IORING_SETUP_ATTACH_WQ, we could just cancel everything on the
io-wq when exiting. But that's not the case if they are shared, so
cancel for the specific ctx instead.Cc: stable@vger.kernel.org
Fixes: 24369c2e3bb0 ("io_uring: add io-wq workqueue sharing")
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 9faadcc8abe4b83d0263216dc3a6321d5bbd616b upstream.
Once we created a file for current context during setup, we should not
call io_ring_ctx_wait_and_kill() directly as it'll be done by fput(file)Cc: stable@vger.kernel.org # 5.10
Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov
[axboe: fix unused 'ret' for !CONFIG_UNIX]
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit a528b04ea40690ff40501f50d618a62a02b19620 upstream.
xa_store() may fail, check the result.
Cc: stable@vger.kernel.org # 5.10
Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit c07e6719511e77c4b289f62bfe96423eb6ea061d upstream.
io_iopoll_complete() does not hold completion_lock to complete polled io,
so in io_wq_submit_work(), we can not call io_req_complete() directly, to
complete polled io, otherwise there maybe concurrent access to cqring,
defer_list, etc, which is not safe. Commit dad1b1242fd5 ("io_uring: always
let io_iopoll_complete() complete polled io") has fixed this issue, but
Pavel reported that IOPOLL apart from rw can do buf reg/unreg requests(
IORING_OP_PROVIDE_BUFFERS or IORING_OP_REMOVE_BUFFERS), so the fix is not
good.Given that io_iopoll_complete() is always called under uring_lock, so here
for polled io, we can also get uring_lock to fix this issue.Fixes: dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io")
Cc: # 5.5+
Signed-off-by: Xiaoguang Wang
Reviewed-by: Pavel Begunkov
[axboe: don't deref 'req' after completing it']
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit dd20166236953c8cd14f4c668bf972af32f0c6be upstream.
Doing vectored buf-select read with 0 iovec passed is meaningless and
utterly broken, forbid it.Cc: # 5.7+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit dad1b1242fd5717af18ae4ac9d12b9f65849e13a upstream.
Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
[ 95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
[ 95.505921]
[ 95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G B
W 5.10.0-rc5+ #1
[ 95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 95.508248] Call Trace:
[ 95.508683] dump_stack+0x107/0x163
[ 95.509323] ? io_commit_cqring+0x3ec/0x8e0
[ 95.509982] print_address_description.constprop.0+0x3e/0x60
[ 95.510814] ? vprintk_func+0x98/0x140
[ 95.511399] ? io_commit_cqring+0x3ec/0x8e0
[ 95.512036] ? io_commit_cqring+0x3ec/0x8e0
[ 95.512733] kasan_report_invalid_free+0x51/0x80
[ 95.513431] ? io_commit_cqring+0x3ec/0x8e0
[ 95.514047] __kasan_slab_free+0x141/0x160
[ 95.514699] kfree+0xd1/0x390
[ 95.515182] io_commit_cqring+0x3ec/0x8e0
[ 95.515799] __io_req_complete.part.0+0x64/0x90
[ 95.516483] io_wq_submit_work+0x1fa/0x260
[ 95.517117] io_worker_handle_work+0xeac/0x1c00
[ 95.517828] io_wqe_worker+0xc94/0x11a0
[ 95.518438] ? io_worker_handle_work+0x1c00/0x1c00
[ 95.519151] ? __kthread_parkme+0x11d/0x1d0
[ 95.519806] ? io_worker_handle_work+0x1c00/0x1c00
[ 95.520512] ? io_worker_handle_work+0x1c00/0x1c00
[ 95.521211] kthread+0x396/0x470
[ 95.521727] ? _raw_spin_unlock_irq+0x24/0x30
[ 95.522380] ? kthread_mod_delayed_work+0x180/0x180
[ 95.523108] ret_from_fork+0x22/0x30
[ 95.523684]
[ 95.523985] Allocated by task 4035:
[ 95.524543] kasan_save_stack+0x1b/0x40
[ 95.525136] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 95.525882] kmem_cache_alloc_trace+0x17b/0x310
[ 95.533930] io_queue_sqe+0x225/0xcb0
[ 95.534505] io_submit_sqes+0x1768/0x25f0
[ 95.535164] __x64_sys_io_uring_enter+0x89e/0xd10
[ 95.535900] do_syscall_64+0x33/0x40
[ 95.536465] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 95.537199]
[ 95.537505] Freed by task 4035:
[ 95.538003] kasan_save_stack+0x1b/0x40
[ 95.538599] kasan_set_track+0x1c/0x30
[ 95.539177] kasan_set_free_info+0x1b/0x30
[ 95.539798] __kasan_slab_free+0x112/0x160
[ 95.540427] kfree+0xd1/0x390
[ 95.540910] io_commit_cqring+0x3ec/0x8e0
[ 95.541516] io_iopoll_complete+0x914/0x1390
[ 95.542150] io_do_iopoll+0x580/0x700
[ 95.542724] io_iopoll_try_reap_events.part.0+0x108/0x200
[ 95.543512] io_ring_ctx_wait_and_kill+0x118/0x340
[ 95.544206] io_uring_release+0x43/0x50
[ 95.544791] __fput+0x28d/0x940
[ 95.545291] task_work_run+0xea/0x1b0
[ 95.545873] do_exit+0xb6a/0x2c60
[ 95.546400] do_group_exit+0x12a/0x320
[ 95.546967] __x64_sys_exit_group+0x3f/0x50
[ 95.547605] do_syscall_64+0x33/0x40
[ 95.548155] entry_SYSCALL_64_after_hwframe+0x44/0xa9The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
we'll complete req by calling io_req_complete(), which will hold completion_lock
to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
hold completion_lock to call io_commit_cqring(), then there maybe concurrent
access to ctx->defer_list, double free may happen.To fix this bug, we always let io_iopoll_complete() complete polled io.
Cc: # 5.5+
Reported-by: Abaci Fuzz
Signed-off-by: Xiaoguang Wang
Reviewed-by: Pavel Begunkov
Reviewed-by: Joseph Qi
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 31bff9a51b264df6d144931a6a5f1d6cc815ed4b upstream.
IOPOLL allows buffer remove/provide requests, but they doesn't
synchronise by rules of IOPOLL, namely it have to hold uring_lock.Cc: # 5.7+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
commit 59850d226e4907a6f37c1d2fe5ba97546a8691a4 upstream.
Checking !list_empty(&ctx->cq_overflow_list) around noflush in
io_cqring_events() is racy, because if it fails but a request overflowed
just after that, io_cqring_overflow_flush() still will be called.Remove the second check, it shouldn't be a problem for performance,
because there is cq_check_overflow bit check just above.Cc: # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman -
[ Upstream commit cda286f0715c82f8117e166afd42cca068876dde ]
io_uring_cancel_task_requests() doesn't imply that the ring is going
away, it may continue to work well after that. The problem is that it
sets ->cq_overflow_flushed effectively disabling the CQ overflow featureSplit setting cq_overflow_flushed from flush, and do the first one only
on exit. It's ok in terms of cancellations because there is a
io_uring->in_idle check in __io_cqring_fill_event().It also fixes a race with setting ->cq_overflow_flushed in
io_uring_cancel_task_requests, whuch's is not atomic and a part of a
bitmask with other flags. Though, the only other flag that's not set
during init is drain_next, so it's not as bad for sane architectures.Signed-off-by: Pavel Begunkov
Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit 634578f800652035debba3098d8ab0d21af7c7a5 ]
It's not safe to call io_cqring_overflow_flush() for IOPOLL mode without
hodling uring_lock, because it does synchronisation differently. Make
sure we have it.As for io_ring_exit_work(), we don't even need it there because
io_ring_ctx_wait_and_kill() already set force flag making all overflowed
requests to be dropped.Cc: # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin -
[ Upstream commit df9923f96717d0aebb0a73adbcf6285fa79e38cb ]
io_uring_cancel_files() cancels all request that match files regardless
of task. There is no real need in that, cancel only requests of the
specified task. That also handles SQPOLL case as it already changes task
to it.Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin
08 Dec, 2020
1 commit
-
Put file as part of error handling when setting up io ctx to fix
memory leaks like the following one.BUG: memory leak
unreferenced object 0xffff888101ea2200 (size 256):
comm "syz-executor355", pid 8470, jiffies 4294953658 (age 32.400s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
20 59 03 01 81 88 ff ff 80 87 a8 10 81 88 ff ff Y..............
backtrace:
[] kmem_cache_zalloc include/linux/slab.h:654 [inline]
[] __alloc_file+0x1f/0x130 fs/file_table.c:101
[] alloc_empty_file+0x69/0x120 fs/file_table.c:151
[] alloc_file+0x33/0x1b0 fs/file_table.c:193
[] alloc_file_pseudo+0xb2/0x140 fs/file_table.c:233
[] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[] anon_inode_getfile+0xaa/0x120 fs/anon_inodes.c:74
[] io_uring_get_fd fs/io_uring.c:9198 [inline]
[] io_uring_create fs/io_uring.c:9377 [inline]
[] io_uring_setup+0x1125/0x1630 fs/io_uring.c:9411
[] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9Reported-by: syzbot+71c4697e27c99fddcf17@syzkaller.appspotmail.com
Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references")
Cc: Pavel Begunkov
Signed-off-by: Hillf Danton
Signed-off-by: Jens Axboe
07 Dec, 2020
1 commit
-
After io_identity_cow() copies an work.identity it wants to copy creds
to the new just allocated id, not the old one. Otherwise it's
akin to req->work.identity->creds = req->work.identity->creds.Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
01 Dec, 2020
1 commit
-
__io_compat_recvmsg_copy_hdr() with REQ_F_BUFFER_SELECT reads out iov
len but never assigns it to iov/fast_iov, leaving sr->len with garbage.
Hopefully, following io_buffer_select() truncates it to the selected
buffer size, but the value is still may be under what was specified.Cc: # 5.7
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
26 Nov, 2020
1 commit
-
When one task is in io_uring_cancel_files() and another is doing
io_prep_async_work() a race may happen. That's because after accounting
a request inflight in first call to io_grab_identity() it still may fail
and go to io_identity_cow(), which migh briefly keep dangling
work.identity and not only.Grab files last, so io_prep_async_work() won't fail if it did get into
->inflight_list.note: the bug shouldn't exist after making io_uring_cancel_files() not
poking into other tasks' requests.Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
24 Nov, 2020
2 commits
-
iov_iter::type is a bitmask that also keeps direction etc., so it
shouldn't be directly compared against ITER_*. Use proper helper.Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: David Howells
Signed-off-by: Pavel Begunkov
Cc: # 5.9
Signed-off-by: Jens Axboe -
Abaci Fuzz reported a shift-out-of-bounds BUG in io_uring_create():
[ 59.598207] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 59.599665] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 59.601230] CPU: 0 PID: 963 Comm: a.out Not tainted 5.10.0-rc4+ #3
[ 59.602502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 59.603673] Call Trace:
[ 59.604286] dump_stack+0x107/0x163
[ 59.605237] ubsan_epilogue+0xb/0x5a
[ 59.606094] __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e
[ 59.607335] ? lock_downgrade+0x6c0/0x6c0
[ 59.608182] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.609166] io_uring_create.cold+0x99/0x149
[ 59.610114] io_uring_setup+0xd6/0x140
[ 59.610975] ? io_uring_create+0x2510/0x2510
[ 59.611945] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 59.613007] ? syscall_enter_from_user_mode+0x27/0x80
[ 59.614038] ? trace_hardirqs_on+0x5b/0x180
[ 59.615056] do_syscall_64+0x2d/0x40
[ 59.615940] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 59.617007] RIP: 0033:0x7f2bb8a0b239This is caused by roundup_pow_of_two() if the input entries larger
enough, e.g. 2^32-1. For sq_entries, it will check first and we allow
at most IORING_MAX_ENTRIES, so it is okay. But for cq_entries, we do
round up first, that may overflow and truncate it to 0, which is not
the expected behavior. So check the cq size first and then do round up.Fixes: 88ec3211e463 ("io_uring: round-up cq size before comparing with rounded sq size")
Reported-by: Abaci Fuzz
Signed-off-by: Joseph Qi
Reviewed-by: Stefano Garzarella
Signed-off-by: Jens Axboe
21 Nov, 2020
1 commit
-
Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:- Disallow async path resolution of /proc/self
- Tighten constraints for segmented async buffered reads
- Fix double completion for a retry error case
- Fix for fixed file life times (Pavel)"
* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
io_uring: order refnode recycling
io_uring: get an active ref_node from files_data
io_uring: don't double complete failed reissue request
mm: never attempt async page lock if we've transferred data already
io_uring: handle -EOPNOTSUPP on path resolution
proc: don't allow async path resolution of /proc/self components
18 Nov, 2020
3 commits
-
Don't recycle a refnode until we're done with all requests of nodes
ejected before.Signed-off-by: Pavel Begunkov
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe -
An active ref_node always can be found in ctx->files_data, it's much
safer to get it this way instead of poking into files_data->ref_list.Signed-off-by: Pavel Begunkov
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe -
Zorro reports that an xfstest test case is failing, and it turns out that
for the reissue path we can potentially issue a double completion on the
request for the failure path. There's an issue around the retry as well,
but for now, at least just make sure that we handle the error path
correctly.Cc: stable@vger.kernel.org
Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources")
Reported-by: Zorro Lang
Signed-off-by: Jens Axboe
15 Nov, 2020
1 commit
-
Any attempt to do path resolution on /proc/self from an async worker will
yield -EOPNOTSUPP. We can safely do that resolution from the task itself,
and without blocking, so retry it from there.Ideally io_uring would know this upfront and not have to go through the
worker thread to find out, but that doesn't currently seem feasible.Signed-off-by: Jens Axboe
14 Nov, 2020
1 commit
-
Pull fs freeze fix and cleanups from Darrick Wong:
"A single vfs fix for 5.10, along with two subsequent cleanups.A very long time ago, a hack was added to the vfs fs freeze protection
code to work around lockdep complaints about XFS, which would try to
run a transaction (which requires intwrite protection) to finalize an
xfs freeze (by which time the vfs had already taken intwrite).Fast forward a few years, and XFS fixed the recursive intwrite problem
on its own, and the hack became unnecessary. Fast forward almost a
decade, and latent bugs in the code converting this hack from freeze
flags to freeze locks combine with lockdep bugs to make this reproduce
frequently enough to notice page faults racing with freeze.Since the hack is unnecessary and causes thread race errors, just get
rid of it completely. Making this kind of vfs change midway through a
cycle makes me nervous, but a large enough number of the usual
VFS/ext4/XFS/btrfs suspects have said this looks good and solves a
real problem vector.And once that removal is done, __sb_start_write is now simple enough
that it becomes possible to refactor the function into smaller,
simpler static inline helpers in linux/fs.h. The cleanup is
straightforward.Summary:
- Finally remove the "convert to trylock" weirdness in the fs freezer
code. It was necessary 10 years ago to deal with nested
transactions in XFS, but we've long since removed that; and now
this is causing subtle race conditions when lockdep goes offline
and sb_start_* aren't prepared to retry a trylock failure.- Minor cleanups of the sb_start_* fs freeze helpers"
* tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: move __sb_{start,end}_write* to fs.h
vfs: separate __sb_start_write into blocking and non-blocking helpers
vfs: remove lockdep bogosity in __sb_start_write
12 Nov, 2020
1 commit
-
If an application specifies IORING_SETUP_CQSIZE to set the CQ ring size
to a specific size, we ensure that the CQ size is at least that of the
SQ ring size. But in doing so, we compare the already rounded up to power
of two SQ size to the as-of yet unrounded CQ size. This means that if an
application passes in non power of two sizes, we can return -EINVAL when
the final value would've been fine. As an example, an application passing
in 100/100 for sq/cq size should end up with 128 for both. But since we
round the SQ size first, we compare the CQ size of 100 to 128, and return
-EINVAL as that is too small.Cc: stable@vger.kernel.org
Fixes: 33a107f0a1b8 ("io_uring: allow application controlled CQ ring size")
Reported-by: Dan Melnic
Signed-off-by: Jens Axboe
11 Nov, 2020
1 commit
-
Break this function into two helpers so that it's obvious that the
trylock versions return a value that must be checked, and the blocking
versions don't require that. While we're at it, clean up the return
type mismatch.Signed-off-by: Darrick J. Wong
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
06 Nov, 2020
1 commit
-
We can't just go over linked requests because it may race with linked
timeouts. Take ctx->completion_lock in that case.Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe