20 Jan, 2021

3 commits

  • [ Upstream commit de7f1d9e99d8b99e4e494ad8fcd91f0c4c5c9357 ]

    io_uring fds marked O_CLOEXEC and we explicitly cancel all requests
    before going through exec, so we don't want to leave task's file
    references to not our anymore io_uring instances.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     
  • [ Upstream commit d434ab6db524ab1efd0afad4ffa1ee65ca6ac097 ]

    __io_req_task_submit() run by task_work can set mm and files, but
    io_sq_thread() in some cases, and because __io_sq_thread_acquire_mm()
    and __io_sq_thread_acquire_files() do a simple current->mm/files check
    it may end up submitting IO with mm/files of another task.

    We also need to drop it after in the end to drop potentially grabbed
    references to them.

    Cc: stable@vger.kernel.org # 5.9+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     
  • [ Upstream commit 621fadc22365f3cf307bcd9048e3372e9ee9cdcc ]

    In rare cases a task may be exiting while io_ring_exit_work() trying to
    cancel/wait its requests. It's ok for __io_sq_thread_acquire_mm()
    because of SQPOLL check, but is not for __io_sq_thread_acquire_files().
    Play safe and fail for both of them.

    Cc: stable@vger.kernel.org # 5.5+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     

17 Jan, 2021

4 commits

  • [ Upstream commit 3e2224c5867fead6c0b94b84727cc676ac6353a3 ]

    alloc_fixed_file_ref_node() currently returns an ERR_PTR on failure.
    io_sqe_files_unregister() expects it to return NULL and since it can only
    return -ENOMEM, it makes more sense to change alloc_fixed_file_ref_node()
    to behave that way.

    Fixes: 1ffc54220c44 ("io_uring: fix io_sqe_files_unregister() hangs")
    Reported-by: Dan Carpenter
    Signed-off-by: Matthew Wilcox (Oracle)
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Matthew Wilcox (Oracle)
     
  • commit 6c503150ae33ee19036255cfda0998463613352c upstream

    IOPOLL skips completion locking but keeps it under uring_lock, thus
    io_cqring_overflow_flush() and so io_cqring_events() need additional
    locking with uring_lock in some cases for IOPOLL.

    Remove __io_cqring_overflow_flush() from io_cqring_events(), introduce a
    wrapper around flush doing needed synchronisation and call it by hand.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     
  • commit 89448c47b8452b67c146dc6cad6f737e004c5caf upstream

    We don't need to take uring_lock for SQPOLL|IOPOLL to do
    io_cqring_overflow_flush() when cq_overflow_list is empty, remove it
    from the hot path.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     
  • commit 81b6d05ccad4f3d8a9dfb091fb46ad6978ee40e4 upstream

    io_req_task_submit() might be called for IOPOLL, do the fail path under
    uring_lock to comply with IOPOLL synchronisation based solely on it.

    Cc: stable@vger.kernel.org # 5.5+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     

06 Jan, 2021

7 commits

  • [ Upstream commit 9cd2be519d05ee78876d55e8e902b7125f78b74f ]

    list_empty_careful() is not racy only if some conditions are met, i.e.
    no re-adds after del_init. io_cqring_overflow_flush() does list_move(),
    so it's actually racy.

    Remove those checks, we have ->cq_check_overflow for the fast path.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     
  • commit 65b2b213484acd89a3c20dbb524e52a2f3793b78 upstream.

    syzbot reports following issue:
    INFO: task syz-executor.2:12399 can't die for more than 143 seconds.
    task:syz-executor.2 state:D stack:28744 pid:12399 ppid: 8504 flags:0x00004004
    Call Trace:
    context_switch kernel/sched/core.c:3773 [inline]
    __schedule+0x893/0x2170 kernel/sched/core.c:4522
    schedule+0xcf/0x270 kernel/sched/core.c:4600
    schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
    do_wait_for_common kernel/sched/completion.c:85 [inline]
    __wait_for_common kernel/sched/completion.c:106 [inline]
    wait_for_common kernel/sched/completion.c:117 [inline]
    wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
    kthread_stop+0x17a/0x720 kernel/kthread.c:596
    io_put_sq_data fs/io_uring.c:7193 [inline]
    io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
    io_finish_async fs/io_uring.c:7297 [inline]
    io_sq_offload_create fs/io_uring.c:8015 [inline]
    io_uring_create fs/io_uring.c:9433 [inline]
    io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45deb9
    Code: Unable to access opcode bytes at RIP 0x45de8f.
    RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
    RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
    RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
    RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
    R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
    INFO: task syz-executor.2:12399 blocked for more than 143 seconds.
    Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

    Currently we don't have a reproducer yet, but seems that there is a
    race in current codes:
    => io_put_sq_data
    ctx_list is empty now. |
    ==> kthread_park(sqd->thread); |
    | T1: sq thread is parked now.
    ==> kthread_stop(sqd->thread); |
    KTHREAD_SHOULD_STOP is set now.|
    ===> kthread_unpark(k); |
    | T2: sq thread is now unparkd, run again.
    |
    | T3: sq thread is now preempted out.
    |
    ===> wake_up_process(k); |
    |
    | T4: Since sqd ctx_list is empty, needs_sched will be true,
    | then sq thread sets task state to TASK_INTERRUPTIBLE,
    | and schedule, now sq thread will never be waken up.
    ===> wait_for_completion |

    I have artificially used mdelay() to simulate above race, will get same
    stack like this syzbot report, but to be honest, I'm not sure this code
    race triggers syzbot report.

    To fix this possible code race, when sq thread is unparked, need to check
    whether sq thread has been stopped.

    Reported-by: syzbot+03beeb595f074db9cfd1@syzkaller.appspotmail.com
    Signed-off-by: Xiaoguang Wang
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Xiaoguang Wang
     
  • commit 1ffc54220c444774b7f09e6d2121e732f8e19b94 upstream.

    io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
    however requests keeping them may never complete, e.g. because of some
    userspace dependency. Make sure it's interruptible otherwise it would
    hang forever.

    Cc: stable@vger.kernel.org # 5.6+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     
  • commit 1642b4450d20e31439c80c28256c8eee08684698 upstream.

    Setting a new reference node to a file data is not trivial, don't repeat
    it, add and use a helper.

    Cc: stable@vger.kernel.org # 5.6+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     
  • commit ac0648a56c1ff66c1cbf735075ad33a26cbc50de upstream.

    io_file_data_ref_zero() can be invoked from soft-irq from the RCU core,
    hence we need to ensure that the file_data lock is bottom half safe. Use
    the _bh() variants when grabbing this lock.

    Reported-by: syzbot+1f4ba1e5520762c523c6@syzkaller.appspotmail.com
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     
  • commit 77788775c7132a8d93c6930ab1bd84fc743c7cb7 upstream.

    If we COW the identity, we assume that ->mm never changes. But this
    isn't true of multiple processes end up sharing the ring. Hence treat
    id->mm like like any other process compontent when it comes to the
    identity mapping. This is pretty trivial, just moving the existing grab
    into io_grab_identity(), and including a check for the match.

    Cc: stable@vger.kernel.org # 5.10
    Fixes: 1e6fa5216a0e ("io_uring: COW io_identity on mismatch")
    Reported-by: Christian Brauner :
    Tested-by: Christian Brauner :
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     
  • commit dfea9fce29fda6f2f91161677e0e0d9b671bc099 upstream.

    The purpose of io_uring_cancel_files() is to wait for all requests
    matching ->files to go/be cancelled. We should first drop files of a
    request in io_req_drop_files() and only then make it undiscoverable for
    io_uring_cancel_files.

    First drop, then delete from list. It's ok to leave req->id->files
    dangling, because it's not dereferenced by cancellation code, only
    compared against. It would potentially go to sleep and be awaken by
    following in io_req_drop_files() wake_up().

    Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
    Cc: # 5.5+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     

30 Dec, 2020

11 commits

  • commit 00c18640c2430c4bafaaeede1f9dd6f7ec0e4b25 upstream.

    Before IORING_SETUP_ATTACH_WQ, we could just cancel everything on the
    io-wq when exiting. But that's not the case if they are shared, so
    cancel for the specific ctx instead.

    Cc: stable@vger.kernel.org
    Fixes: 24369c2e3bb0 ("io_uring: add io-wq workqueue sharing")
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jens Axboe
     
  • commit 9faadcc8abe4b83d0263216dc3a6321d5bbd616b upstream.

    Once we created a file for current context during setup, we should not
    call io_ring_ctx_wait_and_kill() directly as it'll be done by fput(file)

    Cc: stable@vger.kernel.org # 5.10
    Reported-by: syzbot+c9937dfb2303a5f18640@syzkaller.appspotmail.com
    Signed-off-by: Pavel Begunkov
    [axboe: fix unused 'ret' for !CONFIG_UNIX]
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     
  • commit a528b04ea40690ff40501f50d618a62a02b19620 upstream.

    xa_store() may fail, check the result.

    Cc: stable@vger.kernel.org # 5.10
    Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     
  • commit c07e6719511e77c4b289f62bfe96423eb6ea061d upstream.

    io_iopoll_complete() does not hold completion_lock to complete polled io,
    so in io_wq_submit_work(), we can not call io_req_complete() directly, to
    complete polled io, otherwise there maybe concurrent access to cqring,
    defer_list, etc, which is not safe. Commit dad1b1242fd5 ("io_uring: always
    let io_iopoll_complete() complete polled io") has fixed this issue, but
    Pavel reported that IOPOLL apart from rw can do buf reg/unreg requests(
    IORING_OP_PROVIDE_BUFFERS or IORING_OP_REMOVE_BUFFERS), so the fix is not
    good.

    Given that io_iopoll_complete() is always called under uring_lock, so here
    for polled io, we can also get uring_lock to fix this issue.

    Fixes: dad1b1242fd5 ("io_uring: always let io_iopoll_complete() complete polled io")
    Cc: # 5.5+
    Signed-off-by: Xiaoguang Wang
    Reviewed-by: Pavel Begunkov
    [axboe: don't deref 'req' after completing it']
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Xiaoguang Wang
     
  • commit dd20166236953c8cd14f4c668bf972af32f0c6be upstream.

    Doing vectored buf-select read with 0 iovec passed is meaningless and
    utterly broken, forbid it.

    Cc: # 5.7+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     
  • commit dad1b1242fd5717af18ae4ac9d12b9f65849e13a upstream.

    Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
    [ 95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
    [ 95.505921]
    [ 95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G B
    W 5.10.0-rc5+ #1
    [ 95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    [ 95.508248] Call Trace:
    [ 95.508683] dump_stack+0x107/0x163
    [ 95.509323] ? io_commit_cqring+0x3ec/0x8e0
    [ 95.509982] print_address_description.constprop.0+0x3e/0x60
    [ 95.510814] ? vprintk_func+0x98/0x140
    [ 95.511399] ? io_commit_cqring+0x3ec/0x8e0
    [ 95.512036] ? io_commit_cqring+0x3ec/0x8e0
    [ 95.512733] kasan_report_invalid_free+0x51/0x80
    [ 95.513431] ? io_commit_cqring+0x3ec/0x8e0
    [ 95.514047] __kasan_slab_free+0x141/0x160
    [ 95.514699] kfree+0xd1/0x390
    [ 95.515182] io_commit_cqring+0x3ec/0x8e0
    [ 95.515799] __io_req_complete.part.0+0x64/0x90
    [ 95.516483] io_wq_submit_work+0x1fa/0x260
    [ 95.517117] io_worker_handle_work+0xeac/0x1c00
    [ 95.517828] io_wqe_worker+0xc94/0x11a0
    [ 95.518438] ? io_worker_handle_work+0x1c00/0x1c00
    [ 95.519151] ? __kthread_parkme+0x11d/0x1d0
    [ 95.519806] ? io_worker_handle_work+0x1c00/0x1c00
    [ 95.520512] ? io_worker_handle_work+0x1c00/0x1c00
    [ 95.521211] kthread+0x396/0x470
    [ 95.521727] ? _raw_spin_unlock_irq+0x24/0x30
    [ 95.522380] ? kthread_mod_delayed_work+0x180/0x180
    [ 95.523108] ret_from_fork+0x22/0x30
    [ 95.523684]
    [ 95.523985] Allocated by task 4035:
    [ 95.524543] kasan_save_stack+0x1b/0x40
    [ 95.525136] __kasan_kmalloc.constprop.0+0xc2/0xd0
    [ 95.525882] kmem_cache_alloc_trace+0x17b/0x310
    [ 95.533930] io_queue_sqe+0x225/0xcb0
    [ 95.534505] io_submit_sqes+0x1768/0x25f0
    [ 95.535164] __x64_sys_io_uring_enter+0x89e/0xd10
    [ 95.535900] do_syscall_64+0x33/0x40
    [ 95.536465] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 95.537199]
    [ 95.537505] Freed by task 4035:
    [ 95.538003] kasan_save_stack+0x1b/0x40
    [ 95.538599] kasan_set_track+0x1c/0x30
    [ 95.539177] kasan_set_free_info+0x1b/0x30
    [ 95.539798] __kasan_slab_free+0x112/0x160
    [ 95.540427] kfree+0xd1/0x390
    [ 95.540910] io_commit_cqring+0x3ec/0x8e0
    [ 95.541516] io_iopoll_complete+0x914/0x1390
    [ 95.542150] io_do_iopoll+0x580/0x700
    [ 95.542724] io_iopoll_try_reap_events.part.0+0x108/0x200
    [ 95.543512] io_ring_ctx_wait_and_kill+0x118/0x340
    [ 95.544206] io_uring_release+0x43/0x50
    [ 95.544791] __fput+0x28d/0x940
    [ 95.545291] task_work_run+0xea/0x1b0
    [ 95.545873] do_exit+0xb6a/0x2c60
    [ 95.546400] do_group_exit+0x12a/0x320
    [ 95.546967] __x64_sys_exit_group+0x3f/0x50
    [ 95.547605] do_syscall_64+0x33/0x40
    [ 95.548155] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
    we'll complete req by calling io_req_complete(), which will hold completion_lock
    to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
    hold completion_lock to call io_commit_cqring(), then there maybe concurrent
    access to ctx->defer_list, double free may happen.

    To fix this bug, we always let io_iopoll_complete() complete polled io.

    Cc: # 5.5+
    Reported-by: Abaci Fuzz
    Signed-off-by: Xiaoguang Wang
    Reviewed-by: Pavel Begunkov
    Reviewed-by: Joseph Qi
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Xiaoguang Wang
     
  • commit 31bff9a51b264df6d144931a6a5f1d6cc815ed4b upstream.

    IOPOLL allows buffer remove/provide requests, but they doesn't
    synchronise by rules of IOPOLL, namely it have to hold uring_lock.

    Cc: # 5.7+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     
  • commit 59850d226e4907a6f37c1d2fe5ba97546a8691a4 upstream.

    Checking !list_empty(&ctx->cq_overflow_list) around noflush in
    io_cqring_events() is racy, because if it fails but a request overflowed
    just after that, io_cqring_overflow_flush() still will be called.

    Remove the second check, it shouldn't be a problem for performance,
    because there is cq_check_overflow bit check just above.

    Cc: # 5.5+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Pavel Begunkov
     
  • [ Upstream commit cda286f0715c82f8117e166afd42cca068876dde ]

    io_uring_cancel_task_requests() doesn't imply that the ring is going
    away, it may continue to work well after that. The problem is that it
    sets ->cq_overflow_flushed effectively disabling the CQ overflow feature

    Split setting cq_overflow_flushed from flush, and do the first one only
    on exit. It's ok in terms of cancellations because there is a
    io_uring->in_idle check in __io_cqring_fill_event().

    It also fixes a race with setting ->cq_overflow_flushed in
    io_uring_cancel_task_requests, whuch's is not atomic and a part of a
    bitmask with other flags. Though, the only other flag that's not set
    during init is drain_next, so it's not as bad for sane architectures.

    Signed-off-by: Pavel Begunkov
    Fixes: 0f2122045b946 ("io_uring: don't rely on weak ->files references")
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     
  • [ Upstream commit 634578f800652035debba3098d8ab0d21af7c7a5 ]

    It's not safe to call io_cqring_overflow_flush() for IOPOLL mode without
    hodling uring_lock, because it does synchronisation differently. Make
    sure we have it.

    As for io_ring_exit_work(), we don't even need it there because
    io_ring_ctx_wait_and_kill() already set force flag making all overflowed
    requests to be dropped.

    Cc: # 5.5+
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     
  • [ Upstream commit df9923f96717d0aebb0a73adbcf6285fa79e38cb ]

    io_uring_cancel_files() cancels all request that match files regardless
    of task. There is no real need in that, cancel only requests of the
    specified task. That also handles SQPOLL case as it already changes task
    to it.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Pavel Begunkov
     

08 Dec, 2020

1 commit

  • Put file as part of error handling when setting up io ctx to fix
    memory leaks like the following one.

    BUG: memory leak
    unreferenced object 0xffff888101ea2200 (size 256):
    comm "syz-executor355", pid 8470, jiffies 4294953658 (age 32.400s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    20 59 03 01 81 88 ff ff 80 87 a8 10 81 88 ff ff Y..............
    backtrace:
    [] kmem_cache_zalloc include/linux/slab.h:654 [inline]
    [] __alloc_file+0x1f/0x130 fs/file_table.c:101
    [] alloc_empty_file+0x69/0x120 fs/file_table.c:151
    [] alloc_file+0x33/0x1b0 fs/file_table.c:193
    [] alloc_file_pseudo+0xb2/0x140 fs/file_table.c:233
    [] anon_inode_getfile fs/anon_inodes.c:91 [inline]
    [] anon_inode_getfile+0xaa/0x120 fs/anon_inodes.c:74
    [] io_uring_get_fd fs/io_uring.c:9198 [inline]
    [] io_uring_create fs/io_uring.c:9377 [inline]
    [] io_uring_setup+0x1125/0x1630 fs/io_uring.c:9411
    [] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    [] entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported-by: syzbot+71c4697e27c99fddcf17@syzkaller.appspotmail.com
    Fixes: 0f2122045b94 ("io_uring: don't rely on weak ->files references")
    Cc: Pavel Begunkov
    Signed-off-by: Hillf Danton
    Signed-off-by: Jens Axboe

    Hillf Danton
     

07 Dec, 2020

1 commit


01 Dec, 2020

1 commit

  • __io_compat_recvmsg_copy_hdr() with REQ_F_BUFFER_SELECT reads out iov
    len but never assigns it to iov/fast_iov, leaving sr->len with garbage.
    Hopefully, following io_buffer_select() truncates it to the selected
    buffer size, but the value is still may be under what was specified.

    Cc: # 5.7
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     

26 Nov, 2020

1 commit

  • When one task is in io_uring_cancel_files() and another is doing
    io_prep_async_work() a race may happen. That's because after accounting
    a request inflight in first call to io_grab_identity() it still may fail
    and go to io_identity_cow(), which migh briefly keep dangling
    work.identity and not only.

    Grab files last, so io_prep_async_work() won't fail if it did get into
    ->inflight_list.

    note: the bug shouldn't exist after making io_uring_cancel_files() not
    poking into other tasks' requests.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     

24 Nov, 2020

2 commits

  • iov_iter::type is a bitmask that also keeps direction etc., so it
    shouldn't be directly compared against ITER_*. Use proper helper.

    Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls")
    Reported-by: David Howells
    Signed-off-by: Pavel Begunkov
    Cc: # 5.9
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     
  • Abaci Fuzz reported a shift-out-of-bounds BUG in io_uring_create():

    [ 59.598207] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
    [ 59.599665] shift exponent 64 is too large for 64-bit type 'long unsigned int'
    [ 59.601230] CPU: 0 PID: 963 Comm: a.out Not tainted 5.10.0-rc4+ #3
    [ 59.602502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    [ 59.603673] Call Trace:
    [ 59.604286] dump_stack+0x107/0x163
    [ 59.605237] ubsan_epilogue+0xb/0x5a
    [ 59.606094] __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e
    [ 59.607335] ? lock_downgrade+0x6c0/0x6c0
    [ 59.608182] ? rcu_read_lock_sched_held+0xaf/0xe0
    [ 59.609166] io_uring_create.cold+0x99/0x149
    [ 59.610114] io_uring_setup+0xd6/0x140
    [ 59.610975] ? io_uring_create+0x2510/0x2510
    [ 59.611945] ? lockdep_hardirqs_on_prepare+0x286/0x400
    [ 59.613007] ? syscall_enter_from_user_mode+0x27/0x80
    [ 59.614038] ? trace_hardirqs_on+0x5b/0x180
    [ 59.615056] do_syscall_64+0x2d/0x40
    [ 59.615940] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 59.617007] RIP: 0033:0x7f2bb8a0b239

    This is caused by roundup_pow_of_two() if the input entries larger
    enough, e.g. 2^32-1. For sq_entries, it will check first and we allow
    at most IORING_MAX_ENTRIES, so it is okay. But for cq_entries, we do
    round up first, that may overflow and truncate it to 0, which is not
    the expected behavior. So check the cq size first and then do round up.

    Fixes: 88ec3211e463 ("io_uring: round-up cq size before comparing with rounded sq size")
    Reported-by: Abaci Fuzz
    Signed-off-by: Joseph Qi
    Reviewed-by: Stefano Garzarella
    Signed-off-by: Jens Axboe

    Joseph Qi
     

21 Nov, 2020

1 commit

  • Pull io_uring fixes from Jens Axboe:
    "Mostly regression or stable fodder:

    - Disallow async path resolution of /proc/self

    - Tighten constraints for segmented async buffered reads

    - Fix double completion for a retry error case

    - Fix for fixed file life times (Pavel)"

    * tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
    io_uring: order refnode recycling
    io_uring: get an active ref_node from files_data
    io_uring: don't double complete failed reissue request
    mm: never attempt async page lock if we've transferred data already
    io_uring: handle -EOPNOTSUPP on path resolution
    proc: don't allow async path resolution of /proc/self components

    Linus Torvalds
     

18 Nov, 2020

3 commits

  • Don't recycle a refnode until we're done with all requests of nodes
    ejected before.

    Signed-off-by: Pavel Begunkov
    Cc: stable@vger.kernel.org # v5.7+
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     
  • An active ref_node always can be found in ctx->files_data, it's much
    safer to get it this way instead of poking into files_data->ref_list.

    Signed-off-by: Pavel Begunkov
    Cc: stable@vger.kernel.org # v5.7+
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     
  • Zorro reports that an xfstest test case is failing, and it turns out that
    for the reissue path we can potentially issue a double completion on the
    request for the failure path. There's an issue around the retry as well,
    but for now, at least just make sure that we handle the error path
    correctly.

    Cc: stable@vger.kernel.org
    Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources")
    Reported-by: Zorro Lang
    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Nov, 2020

1 commit

  • Any attempt to do path resolution on /proc/self from an async worker will
    yield -EOPNOTSUPP. We can safely do that resolution from the task itself,
    and without blocking, so retry it from there.

    Ideally io_uring would know this upfront and not have to go through the
    worker thread to find out, but that doesn't currently seem feasible.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

14 Nov, 2020

1 commit

  • Pull fs freeze fix and cleanups from Darrick Wong:
    "A single vfs fix for 5.10, along with two subsequent cleanups.

    A very long time ago, a hack was added to the vfs fs freeze protection
    code to work around lockdep complaints about XFS, which would try to
    run a transaction (which requires intwrite protection) to finalize an
    xfs freeze (by which time the vfs had already taken intwrite).

    Fast forward a few years, and XFS fixed the recursive intwrite problem
    on its own, and the hack became unnecessary. Fast forward almost a
    decade, and latent bugs in the code converting this hack from freeze
    flags to freeze locks combine with lockdep bugs to make this reproduce
    frequently enough to notice page faults racing with freeze.

    Since the hack is unnecessary and causes thread race errors, just get
    rid of it completely. Making this kind of vfs change midway through a
    cycle makes me nervous, but a large enough number of the usual
    VFS/ext4/XFS/btrfs suspects have said this looks good and solves a
    real problem vector.

    And once that removal is done, __sb_start_write is now simple enough
    that it becomes possible to refactor the function into smaller,
    simpler static inline helpers in linux/fs.h. The cleanup is
    straightforward.

    Summary:

    - Finally remove the "convert to trylock" weirdness in the fs freezer
    code. It was necessary 10 years ago to deal with nested
    transactions in XFS, but we've long since removed that; and now
    this is causing subtle race conditions when lockdep goes offline
    and sb_start_* aren't prepared to retry a trylock failure.

    - Minor cleanups of the sb_start_* fs freeze helpers"

    * tag 'vfs-5.10-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    vfs: move __sb_{start,end}_write* to fs.h
    vfs: separate __sb_start_write into blocking and non-blocking helpers
    vfs: remove lockdep bogosity in __sb_start_write

    Linus Torvalds
     

12 Nov, 2020

1 commit

  • If an application specifies IORING_SETUP_CQSIZE to set the CQ ring size
    to a specific size, we ensure that the CQ size is at least that of the
    SQ ring size. But in doing so, we compare the already rounded up to power
    of two SQ size to the as-of yet unrounded CQ size. This means that if an
    application passes in non power of two sizes, we can return -EINVAL when
    the final value would've been fine. As an example, an application passing
    in 100/100 for sq/cq size should end up with 128 for both. But since we
    round the SQ size first, we compare the CQ size of 100 to 128, and return
    -EINVAL as that is too small.

    Cc: stable@vger.kernel.org
    Fixes: 33a107f0a1b8 ("io_uring: allow application controlled CQ ring size")
    Reported-by: Dan Melnic
    Signed-off-by: Jens Axboe

    Jens Axboe
     

11 Nov, 2020

1 commit


06 Nov, 2020

1 commit