Eric Lee / smarc-fsl-linux-kernel

20 Jan, 2021

3 commits

94dbb87fc io_uring: drop file refs after task cancel ... Browse Code »

[ Upstream commit de7f1d9e99d8b99e4e494ad8fcd91f0c4c5c9357 ]

io_uring fds marked O_CLOEXEC and we explicitly cancel all requests
before going through exec, so we don't want to leave task's file
references to not our anymore io_uring instances.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-20 01:27:25 +0800
f7f32822a io_uring: drop mm and files after task_work_run ... Browse Code »

[ Upstream commit d434ab6db524ab1efd0afad4ffa1ee65ca6ac097 ]

__io_req_task_submit() run by task_work can set mm and files, but
io_sq_thread() in some cases, and because __io_sq_thread_acquire_mm()
and __io_sq_thread_acquire_files() do a simple current->mm/files check
it may end up submitting IO with mm/files of another task.

We also need to drop it after in the end to drop potentially grabbed
references to them.

Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-20 01:27:23 +0800
a3647cddf io_uring: don't take files/mm for a dead task ... Browse Code »

[ Upstream commit 621fadc22365f3cf307bcd9048e3372e9ee9cdcc ]

In rare cases a task may be exiting while io_ring_exit_work() trying to
cancel/wait its requests. It's ok for __io_sq_thread_acquire_mm()
because of SQPOLL check, but is not for __io_sq_thread_acquire_files().
Play safe and fail for both of them.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-20 01:27:23 +0800

17 Jan, 2021

4 commits

458b40598 io_uring: Fix return value from alloc_fixed_file_ref_node ... Browse Code »

[ Upstream commit 3e2224c5867fead6c0b94b84727cc676ac6353a3 ]

alloc_fixed_file_ref_node() currently returns an ERR_PTR on failure.
io_sqe_files_unregister() expects it to return NULL and since it can only
return -ENOMEM, it makes more sense to change alloc_fixed_file_ref_node()
to behave that way.

Fixes: 1ffc54220c44 ("io_uring: fix io_sqe_files_unregister() hangs")
Reported-by: Dan Carpenter
Signed-off-by: Matthew Wilcox (Oracle)
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Matthew Wilcox (Oracle)
2021-01-17 21:16:53 +0800
85e25e237 io_uring: patch up IOPOLL overflow_flush sync ... Browse Code »

commit 6c503150ae33ee19036255cfda0998463613352c upstream

IOPOLL skips completion locking but keeps it under uring_lock, thus
io_cqring_overflow_flush() and so io_cqring_events() need additional
locking with uring_lock in some cases for IOPOLL.

Remove __io_cqring_overflow_flush() from io_cqring_events(), introduce a
wrapper around flush doing needed synchronisation and call it by hand.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-17 21:16:53 +0800
bc924dd21 io_uring: limit {io|sq}poll submit locking scope ... Browse Code »

commit 89448c47b8452b67c146dc6cad6f737e004c5caf upstream

We don't need to take uring_lock for SQPOLL|IOPOLL to do
io_cqring_overflow_flush() when cq_overflow_list is empty, remove it
from the hot path.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-17 21:16:53 +0800
1d5e50da5 io_uring: synchronise IOPOLL on task_submit fail ... Browse Code »

commit 81b6d05ccad4f3d8a9dfb091fb46ad6978ee40e4 upstream

io_req_task_submit() might be called for IOPOLL, do the fail path under
uring_lock to comply with IOPOLL synchronisation based solely on it.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-17 21:16:52 +0800

06 Jan, 2021

7 commits

c7b04d27c io_uring: remove racy overflow list fast checks ... Browse Code »

[ Upstream commit 9cd2be519d05ee78876d55e8e902b7125f78b74f ]

list_empty_careful() is not racy only if some conditions are met, i.e.
no re-adds after del_init. io_cqring_overflow_flush() does list_move(),
so it's actually racy.

Remove those checks, we have ->cq_check_overflow for the fast path.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2021-01-06 21:56:55 +0800
b5a2f093b io_uring: check kthread stopped flag when sq thread is unparked ... Browse Code »

commit 65b2b213484acd89a3c20dbb524e52a2f3793b78 upstream.

syzbot reports following issue:
INFO: task syz-executor.2:12399 can't die for more than 143 seconds.
task:syz-executor.2 state:D stack:28744 pid:12399 ppid: 8504 flags:0x00004004
Call Trace:
context_switch kernel/sched/core.c:3773 [inline]
__schedule+0x893/0x2170 kernel/sched/core.c:4522
schedule+0xcf/0x270 kernel/sched/core.c:4600
schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
kthread_stop+0x17a/0x720 kernel/kthread.c:596
io_put_sq_data fs/io_uring.c:7193 [inline]
io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
io_finish_async fs/io_uring.c:7297 [inline]
io_sq_offload_create fs/io_uring.c:8015 [inline]
io_uring_create fs/io_uring.c:9433 [inline]
io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45deb9
Code: Unable to access opcode bytes at RIP 0x45de8f.
RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
INFO: task syz-executor.2:12399 blocked for more than 143 seconds.
Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Currently we don't have a reproducer yet, but seems that there is a
race in current codes:
=> io_put_sq_data
ctx_list is empty now. |
==> kthread_park(sqd->thread); |
| T1: sq thread is parked now.
==> kthread_stop(sqd->thread); |
KTHREAD_SHOULD_STOP is set now.|
===> kthread_unpark(k); |
| T2: sq thread is now unparkd, run again.
|
| T3: sq thread is now preempted out.
|
===> wake_up_process(k); |
|
| T4: Since sqd ctx_list is empty, needs_sched will be true,
| then sq thread sets task state to TASK_INTERRUPTIBLE,
| and schedule, now sq thread will never be waken up.
===> wait_for_completion |

I have artificially used mdelay() to simulate above race, will get same
stack like this syzbot report, but to be honest, I'm not sure this code
race triggers syzbot report.

To fix this possible code race, when sq thread is unparked, need to check
whether sq thread has been stopped.

Reported-by: syzbot+03beeb595f074db9cfd1@syzkaller.appspotmail.com
Signed-off-by: Xiaoguang Wang
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Xiaoguang Wang
2021-01-06 21:56:53 +0800
ce00a7d0d io_uring: fix io_sqe_files_unregister() hangs ... Browse Code »

commit 1ffc54220c444774b7f09e6d2121e732f8e19b94 upstream.

io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2021-01-06 21:56:51 +0800
b25b86936 io_uring: add a helper for setting a ref node ... Browse Code »

commit 1642b4450d20e31439c80c28256c8eee08684698 upstream.

Setting a new reference node to a file data is not trivial, don't repeat
it, add and use a helper.

Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2021-01-06 21:56:50 +0800
25a2de679 io_uring: use bottom half safe lock for fixed file data ... Browse Code »

commit ac0648a56c1ff66c1cbf735075ad33a26cbc50de upstream.

io_file_data_ref_zero() can be invoked from soft-irq from the RCU core,
hence we need to ensure that the file_data lock is bottom half safe. Use
the _bh() variants when grabbing this lock.

Reported-by: syzbot+1f4ba1e5520762c523c6@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jens Axboe
2021-01-06 21:56:50 +0800
7247bc60e io_uring: don't assume mm is constant across submits ... Browse Code »

commit 77788775c7132a8d93c6930ab1bd84fc743c7cb7 upstream.

If we COW the identity, we assume that ->mm never changes. But this
isn't true of multiple processes end up sharing the ring. Hence treat
id->mm like like any other process compontent when it comes to the
identity mapping. This is pretty trivial, just moving the existing grab
into io_grab_identity(), and including a check for the match.

Cc: stable@vger.kernel.org # 5.10
Fixes: 1e6fa5216a0e ("io_uring: COW io_identity on mismatch")
Reported-by: Christian Brauner :
Tested-by: Christian Brauner :
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jens Axboe
2021-01-06 21:56:50 +0800
52504a61a io_uring: close a small race gap for files cancel ... Browse Code »

commit dfea9fce29fda6f2f91161677e0e0d9b671bc099 upstream.

The purpose of io_uring_cancel_files() is to wait for all requests
matching ->files to go/be cancelled. We should first drop files of a
request in io_req_drop_files() and only then make it undiscoverable for
io_uring_cancel_files.

First drop, then delete from list. It's ok to leave req->id->files
dangling, because it's not dereferenced by cancellation code, only
compared against. It would potentially go to sleep and be awaken by
following in io_req_drop_files() wake_up().

Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
Cc: # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2021-01-06 21:56:49 +0800

30 Dec, 2020

11 commits

7b81e2af5 io_uring: make ctx cancel on exit targeted to actual ctx ... Browse Code »

commit 00c18640c2430c4bafaaeede1f9dd6f7ec0e4b25 upstream.

Before IORING_SETUP_ATTACH_WQ, we could just cancel everything on the
io-wq when exiting. But that's not the case if they are shared, so
cancel for the specific ctx instead.

Cc: stable@vger.kernel.org
Fixes: 24369c2e3bb0 ("io_uring: add io-wq workqueue sharing")
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jens Axboe
2020-12-30 18:54:04 +0800
5998fe548 io_uring: fix double io_uring free ... Browse Code »

commit 9faadcc8abe4b83d0263216dc3a6321d5bbd616b upstream.

Once we created a file for current context during setup, we should not
call io_ring_ctx_wait_and_kill() directly as it'll be done by fput(file)

Cc: stable@vger.kernel.org # 5.10
Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov
[axboe: fix unused 'ret' for !CONFIG_UNIX]
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2020-12-30 18:54:04 +0800
9f8ebecc8 io_uring: fix ignoring xa_store errors ... Browse Code »

commit a528b04ea40690ff40501f50d618a62a02b19620 upstream.

xa_store() may fail, check the result.

Cc: stable@vger.kernel.org # 5.10
Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2020-12-30 18:54:04 +0800
10e5fb03e io_uring: hold uring_lock while completing failed polled io in io_wq_submit_work() ... Browse Code »

commit c07e6719511e77c4b289f62bfe96423eb6ea061d upstream.

io_iopoll_complete() does not hold completion_lock to complete polled io,
so in io_wq_submit_work(), we can not call io_req_complete() directly, to
complete polled io, otherwise there maybe concurrent access to cqring,
defer_list, etc, which is not safe. Commit dad1b1242fd5 ("io_uring: always
let io_iopoll_complete() complete polled io") has fixed this issue, but
Pavel reported that IOPOLL apart from rw can do buf reg/unreg requests(
IORING_OP_PROVIDE_BUFFERS or IORING_OP_REMOVE_BUFFERS), so the fix is not
good.

Given that io_iopoll_complete() is always called under uring_lock, so here
for polled io, we can also get uring_lock to fix this issue.

Fixes: dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io")
Cc: # 5.5+
Signed-off-by: Xiaoguang Wang
Reviewed-by: Pavel Begunkov
[axboe: don't deref 'req' after completing it']
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Xiaoguang Wang
2020-12-30 18:54:04 +0800
72a016d42 io_uring: fix 0-iov read buffer select ... Browse Code »

commit dd20166236953c8cd14f4c668bf972af32f0c6be upstream.

Doing vectored buf-select read with 0 iovec passed is meaningless and
utterly broken, forbid it.

Cc: # 5.7+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2020-12-30 18:54:03 +0800
cd13f1d00 io_uring: always let io_iopoll_complete() complete polled io ... Browse Code »

commit dad1b1242fd5717af18ae4ac9d12b9f65849e13a upstream.

Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
[ 95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
[ 95.505921]
[ 95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G B
W 5.10.0-rc5+ #1
[ 95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 95.508248] Call Trace:
[ 95.508683] dump_stack+0x107/0x163
[ 95.509323] ? io_commit_cqring+0x3ec/0x8e0
[ 95.509982] print_address_description.constprop.0+0x3e/0x60
[ 95.510814] ? vprintk_func+0x98/0x140
[ 95.511399] ? io_commit_cqring+0x3ec/0x8e0
[ 95.512036] ? io_commit_cqring+0x3ec/0x8e0
[ 95.512733] kasan_report_invalid_free+0x51/0x80
[ 95.513431] ? io_commit_cqring+0x3ec/0x8e0
[ 95.514047] __kasan_slab_free+0x141/0x160
[ 95.514699] kfree+0xd1/0x390
[ 95.515182] io_commit_cqring+0x3ec/0x8e0
[ 95.515799] __io_req_complete.part.0+0x64/0x90
[ 95.516483] io_wq_submit_work+0x1fa/0x260
[ 95.517117] io_worker_handle_work+0xeac/0x1c00
[ 95.517828] io_wqe_worker+0xc94/0x11a0
[ 95.518438] ? io_worker_handle_work+0x1c00/0x1c00
[ 95.519151] ? __kthread_parkme+0x11d/0x1d0
[ 95.519806] ? io_worker_handle_work+0x1c00/0x1c00
[ 95.520512] ? io_worker_handle_work+0x1c00/0x1c00
[ 95.521211] kthread+0x396/0x470
[ 95.521727] ? _raw_spin_unlock_irq+0x24/0x30
[ 95.522380] ? kthread_mod_delayed_work+0x180/0x180
[ 95.523108] ret_from_fork+0x22/0x30
[ 95.523684]
[ 95.523985] Allocated by task 4035:
[ 95.524543] kasan_save_stack+0x1b/0x40
[ 95.525136] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 95.525882] kmem_cache_alloc_trace+0x17b/0x310
[ 95.533930] io_queue_sqe+0x225/0xcb0
[ 95.534505] io_submit_sqes+0x1768/0x25f0
[ 95.535164] __x64_sys_io_uring_enter+0x89e/0xd10
[ 95.535900] do_syscall_64+0x33/0x40
[ 95.536465] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 95.537199]
[ 95.537505] Freed by task 4035:
[ 95.538003] kasan_save_stack+0x1b/0x40
[ 95.538599] kasan_set_track+0x1c/0x30
[ 95.539177] kasan_set_free_info+0x1b/0x30
[ 95.539798] __kasan_slab_free+0x112/0x160
[ 95.540427] kfree+0xd1/0x390
[ 95.540910] io_commit_cqring+0x3ec/0x8e0
[ 95.541516] io_iopoll_complete+0x914/0x1390
[ 95.542150] io_do_iopoll+0x580/0x700
[ 95.542724] io_iopoll_try_reap_events.part.0+0x108/0x200
[ 95.543512] io_ring_ctx_wait_and_kill+0x118/0x340
[ 95.544206] io_uring_release+0x43/0x50
[ 95.544791] __fput+0x28d/0x940
[ 95.545291] task_work_run+0xea/0x1b0
[ 95.545873] do_exit+0xb6a/0x2c60
[ 95.546400] do_group_exit+0x12a/0x320
[ 95.546967] __x64_sys_exit_group+0x3f/0x50
[ 95.547605] do_syscall_64+0x33/0x40
[ 95.548155] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
we'll complete req by calling io_req_complete(), which will hold completion_lock
to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
hold completion_lock to call io_commit_cqring(), then there maybe concurrent
access to ctx->defer_list, double free may happen.

To fix this bug, we always let io_iopoll_complete() complete polled io.

Cc: # 5.5+
Reported-by: Abaci Fuzz
Signed-off-by: Xiaoguang Wang
Reviewed-by: Pavel Begunkov
Reviewed-by: Joseph Qi
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Xiaoguang Wang
2020-12-30 18:54:03 +0800
f961c2b49 io_uring: fix racy IOPOLL completions ... Browse Code »

commit 31bff9a51b264df6d144931a6a5f1d6cc815ed4b upstream.

IOPOLL allows buffer remove/provide requests, but they doesn't
synchronise by rules of IOPOLL, namely it have to hold uring_lock.

Cc: # 5.7+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2020-12-30 18:54:03 +0800
821d12a15 io_uring: fix io_cqring_events()'s noflush ... Browse Code »

commit 59850d226e4907a6f37c1d2fe5ba97546a8691a4 upstream.

Checking !list_empty(&ctx->cq_overflow_list) around noflush in
io_cqring_events() is racy, because if it fails but a request overflowed
just after that, io_cqring_overflow_flush() still will be called.

Remove the second check, it shouldn't be a problem for performance,
because there is cq_check_overflow bit check just above.

Cc: # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2020-12-30 18:54:03 +0800
b2ec2b12a io_uring: cancel reqs shouldn't kill overflow list ... Browse Code »

[ Upstream commit cda286f0715c82f8117e166afd42cca068876dde ]

io_uring_cancel_task_requests() doesn't imply that the ring is going
away, it may continue to work well after that. The problem is that it
sets ->cq_overflow_flushed effectively disabling the CQ overflow feature

Split setting cq_overflow_flushed from flush, and do the first one only
on exit. It's ok in terms of cancellations because there is a
io_uring->in_idle check in __io_cqring_fill_event().

It also fixes a race with setting ->cq_overflow_flushed in
io_uring_cancel_task_requests, whuch's is not atomic and a part of a
bitmask with other flags. Though, the only other flag that's not set
during init is drain_next, so it's not as bad for sane architectures.

Signed-off-by: Pavel Begunkov
Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2020-12-30 18:54:02 +0800
c0fd45a9a io_uring: fix racy IOPOLL flush overflow ... Browse Code »

[ Upstream commit 634578f800652035debba3098d8ab0d21af7c7a5 ]

It's not safe to call io_cqring_overflow_flush() for IOPOLL mode without
hodling uring_lock, because it does synchronisation differently. Make
sure we have it.

As for io_ring_exit_work(), we don't even need it there because
io_ring_ctx_wait_and_kill() already set force flag making all overflowed
requests to be dropped.

Cc: # 5.5+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2020-12-30 18:54:02 +0800
a773dea1a io_uring: cancel only requests of current task ... Browse Code »

[ Upstream commit df9923f96717d0aebb0a73adbcf6285fa79e38cb ]

io_uring_cancel_files() cancels all request that match files regardless
of task. There is no real need in that, cancel only requests of the
specified task. That also handles SQPOLL case as it already changes task
to it.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Pavel Begunkov
2020-12-30 18:53:59 +0800

08 Dec, 2020

1 commit

f26c08b44 io_uring: fix file leak on error path of io ctx creation ... Browse Code »

Put file as part of error handling when setting up io ctx to fix
memory leaks like the following one.

BUG: memory leak
unreferenced object 0xffff888101ea2200 (size 256):
comm "syz-executor355", pid 8470, jiffies 4294953658 (age 32.400s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
20 59 03 01 81 88 ff ff 80 87 a8 10 81 88 ff ff Y..............
backtrace:
[] kmem_cache_zalloc include/linux/slab.h:654 [inline]
[] __alloc_file+0x1f/0x130 fs/file_table.c:101
[] alloc_empty_file+0x69/0x120 fs/file_table.c:151
[] alloc_file+0x33/0x1b0 fs/file_table.c:193
[] alloc_file_pseudo+0xb2/0x140 fs/file_table.c:233
[] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[] anon_inode_getfile+0xaa/0x120 fs/anon_inodes.c:74
[] io_uring_get_fd fs/io_uring.c:9198 [inline]
[] io_uring_create fs/io_uring.c:9377 [inline]
[] io_uring_setup+0x1125/0x1630 fs/io_uring.c:9411
[] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
[] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+71c4697e27c99fddcf17@syzkaller.appspotmail.com
Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references")
Cc: Pavel Begunkov
Signed-off-by: Hillf Danton
Signed-off-by: Jens Axboe

Hillf Danton
2020-12-08 23:54:26 +0800

07 Dec, 2020

1 commit

e8c954df2 io_uring: fix mis-seting personality's creds ... Browse Code »

After io_identity_cow() copies an work.identity it wants to copy creds
to the new just allocated id, not the old one. Otherwise it's
akin to req->work.identity->creds = req->work.identity->creds.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-12-07 23:43:44 +0800

01 Dec, 2020

1 commit

2d280bc89 io_uring: fix recvmsg setup with compat buf-select ... Browse Code »

__io_compat_recvmsg_copy_hdr() with REQ_F_BUFFER_SELECT reads out iov
len but never assigns it to iov/fast_iov, leaving sr->len with garbage.
Hopefully, following io_buffer_select() truncates it to the selected
buffer size, but the value is still may be under what was specified.

Cc: # 5.7
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-12-01 02:12:03 +0800

26 Nov, 2020

1 commit

af6047034 io_uring: fix files grab/cancel race ... Browse Code »

When one task is in io_uring_cancel_files() and another is doing
io_prep_async_work() a race may happen. That's because after accounting
a request inflight in first call to io_grab_identity() it still may fail
and go to io_identity_cow(), which migh briefly keep dangling
work.identity and not only.

Grab files last, so io_prep_async_work() won't fail if it did get into
->inflight_list.

note: the bug shouldn't exist after making io_uring_cancel_files() not
poking into other tasks' requests.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-11-26 23:50:21 +0800

24 Nov, 2020

2 commits

9c3a205c5 io_uring: fix ITER_BVEC check ... Browse Code »

iov_iter::type is a bitmask that also keeps direction etc., so it
shouldn't be directly compared against ITER_*. Use proper helper.

Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: David Howells
Signed-off-by: Pavel Begunkov
Cc: # 5.9
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-11-24 22:54:30 +0800
eb2667b34 io_uring: fix shift-out-of-bounds when round up cq size ... Browse Code »

Abaci Fuzz reported a shift-out-of-bounds BUG in io_uring_create():

[ 59.598207] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 59.599665] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 59.601230] CPU: 0 PID: 963 Comm: a.out Not tainted 5.10.0-rc4+ #3
[ 59.602502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 59.603673] Call Trace:
[ 59.604286] dump_stack+0x107/0x163
[ 59.605237] ubsan_epilogue+0xb/0x5a
[ 59.606094] __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e
[ 59.607335] ? lock_downgrade+0x6c0/0x6c0
[ 59.608182] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.609166] io_uring_create.cold+0x99/0x149
[ 59.610114] io_uring_setup+0xd6/0x140
[ 59.610975] ? io_uring_create+0x2510/0x2510
[ 59.611945] ? lockdep_hardirqs_on_prepare+0x286/0x400
[ 59.613007] ? syscall_enter_from_user_mode+0x27/0x80
[ 59.614038] ? trace_hardirqs_on+0x5b/0x180
[ 59.615056] do_syscall_64+0x2d/0x40
[ 59.615940] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 59.617007] RIP: 0033:0x7f2bb8a0b239

This is caused by roundup_pow_of_two() if the input entries larger
enough, e.g. 2^32-1. For sq_entries, it will check first and we allow
at most IORING_MAX_ENTRIES, so it is okay. But for cq_entries, we do
round up first, that may overflow and truncate it to 0, which is not
the expected behavior. So check the cq size first and then do round up.

Fixes: 88ec3211e463 ("io_uring: round-up cq size before comparing with rounded sq size")
Reported-by: Abaci Fuzz
Signed-off-by: Joseph Qi
Reviewed-by: Stefano Garzarella
Signed-off-by: Jens Axboe

Joseph Qi
2020-11-24 22:54:30 +0800

21 Nov, 2020

1 commit

fa5fca78b Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:

- Disallow async path resolution of /proc/self

- Tighten constraints for segmented async buffered reads

- Fix double completion for a retry error case

- Fix for fixed file life times (Pavel)"

* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
io_uring: order refnode recycling
io_uring: get an active ref_node from files_data
io_uring: don't double complete failed reissue request
mm: never attempt async page lock if we've transferred data already
io_uring: handle -EOPNOTSUPP on path resolution
proc: don't allow async path resolution of /proc/self components

Linus Torvalds
2020-11-21 03:47:22 +0800

18 Nov, 2020

3 commits

e297822b2 io_uring: order refnode recycling ... Browse Code »

Don't recycle a refnode until we're done with all requests of nodes
ejected before.

Signed-off-by: Pavel Begunkov
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-11-18 23:02:10 +0800
1e5d770bb io_uring: get an active ref_node from files_data ... Browse Code »

An active ref_node always can be found in ctx->files_data, it's much
safer to get it this way instead of poking into files_data->ref_list.

Signed-off-by: Pavel Begunkov
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-11-18 23:02:10 +0800
c993df5a6 io_uring: don't double complete failed reissue request ... Browse Code »

Zorro reports that an xfstest test case is failing, and it turns out that
for the reissue path we can potentially issue a double completion on the
request for the failure path. There's an issue around the retry as well,
but for now, at least just make sure that we handle the error path
correctly.

Cc: stable@vger.kernel.org
Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources")
Reported-by: Zorro Lang
Signed-off-by: Jens Axboe

Jens Axboe
2020-11-18 06:17:29 +0800

15 Nov, 2020

1 commit

944d1444d io_uring: handle -EOPNOTSUPP on path resolution ... Browse Code »

Any attempt to do path resolution on /proc/self from an async worker will
yield -EOPNOTSUPP. We can safely do that resolution from the task itself,
and without blocking, so retry it from there.

Ideally io_uring would know this upfront and not have to go through the
worker thread to find out, but that doesn't currently seem feasible.

Signed-off-by: Jens Axboe

Jens Axboe
2020-11-15 01:22:30 +0800

14 Nov, 2020

1 commit

f01c30de8 Merge tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull fs freeze fix and cleanups from Darrick Wong:
"A single vfs fix for 5.10, along with two subsequent cleanups.

A very long time ago, a hack was added to the vfs fs freeze protection
code to work around lockdep complaints about XFS, which would try to
run a transaction (which requires intwrite protection) to finalize an
xfs freeze (by which time the vfs had already taken intwrite).

Fast forward a few years, and XFS fixed the recursive intwrite problem
on its own, and the hack became unnecessary. Fast forward almost a
decade, and latent bugs in the code converting this hack from freeze
flags to freeze locks combine with lockdep bugs to make this reproduce
frequently enough to notice page faults racing with freeze.

Since the hack is unnecessary and causes thread race errors, just get
rid of it completely. Making this kind of vfs change midway through a
cycle makes me nervous, but a large enough number of the usual
VFS/ext4/XFS/btrfs suspects have said this looks good and solves a
real problem vector.

And once that removal is done, __sb_start_write is now simple enough
that it becomes possible to refactor the function into smaller,
simpler static inline helpers in linux/fs.h. The cleanup is
straightforward.

Summary:

- Finally remove the "convert to trylock" weirdness in the fs freezer
code. It was necessary 10 years ago to deal with nested
transactions in XFS, but we've long since removed that; and now
this is causing subtle race conditions when lockdep goes offline
and sb_start_* aren't prepared to retry a trylock failure.

- Minor cleanups of the sb_start_* fs freeze helpers"

* tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
vfs: move __sb_{start,end}_write* to fs.h
vfs: separate __sb_start_write into blocking and non-blocking helpers
vfs: remove lockdep bogosity in __sb_start_write

Linus Torvalds
2020-11-14 08:07:53 +0800

12 Nov, 2020

1 commit

88ec3211e io_uring: round-up cq size before comparing with rounded sq size ... Browse Code »

If an application specifies IORING_SETUP_CQSIZE to set the CQ ring size
to a specific size, we ensure that the CQ size is at least that of the
SQ ring size. But in doing so, we compare the already rounded up to power
of two SQ size to the as-of yet unrounded CQ size. This means that if an
application passes in non power of two sizes, we can return -EINVAL when
the final value would've been fine. As an example, an application passing
in 100/100 for sq/cq size should end up with 128 for both. But since we
round the SQ size first, we compare the CQ size of 100 to 128, and return
-EINVAL as that is too small.

Cc: stable@vger.kernel.org
Fixes: 33a107f0a1b8 ("io_uring: allow application controlled CQ ring size")
Reported-by: Dan Melnic
Signed-off-by: Jens Axboe

Jens Axboe
2020-11-12 01:42:41 +0800

11 Nov, 2020

1 commit

8a3c84b64 vfs: separate __sb_start_write into blocking and non-blocking helpers ... Browse Code »

Break this function into two helpers so that it's obvious that the
trylock versions return a value that must be checked, and the blocking
versions don't require that. While we're at it, clean up the return
type mismatch.

Signed-off-by: Darrick J. Wong
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig

Darrick J. Wong
2020-11-11 08:53:07 +0800

06 Nov, 2020

1 commit

9a472ef7a io_uring: fix link lookup racing with link timeout ... Browse Code »

We can't just go over linked requests because it may race with linked
timeouts. Take ctx->completion_lock in that case.

Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-11-06 06:36:40 +0800