Eric Lee / smarc-fsl-linux-kernel

13 Dec, 2019

4 commits

74dcfcd1d io_uring: ensure req->submit is copied when req is deferred ... Browse Code »

There's an issue with deferred requests through drain, where if we do
need to defer, we're not copying over the sqe_submit state correctly.
This can result in using uninitialized data when we then later go and
submit the deferred request, like this check in __io_submit_sqe():

if (unlikely(s->index >= ctx->sq_entries))
return -EINVAL;

with 's' being uninitialized, we can randomly fail this check. Fix this
by copying sqe_submit state when we defer a request.

Because it was fixed as part of a cleanup series in mainline, before
anyone realized we had this issue. That removed the separate states
of ->index vs ->submit.sqe. That series is not something I was
comfortable putting into stable, hence the much simpler addition.
Here's the patch in the series that fixes the same issue:

commit cf6fd4bd559ee61a4454b161863c8de6f30f8dca
Author: Pavel Begunkov
Date: Mon Nov 25 23:14:39 2019 +0300

io_uring: inline struct sqe_submit

Reported-by: Andres Freund
Reported-by: Tomáš Chaloupka
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jens Axboe
2019-12-13 15:42:33 +0800
1dec7fcac io_uring: fix missing kmap() declaration on powerpc ... Browse Code »

commit aa4c3967756c6c576a38a23ac511be211462a6b7 upstream.

Christophe reports that current master fails building on powerpc with
this error:

CC fs/io_uring.o
fs/io_uring.c: In function ‘loop_rw_iter’:
fs/io_uring.c:1628:21: error: implicit declaration of function ‘kmap’
[-Werror=implicit-function-declaration]
iovec.iov_base = kmap(iter->bvec->bv_page)
^
fs/io_uring.c:1628:19: warning: assignment makes pointer from integer
without a cast [-Wint-conversion]
iovec.iov_base = kmap(iter->bvec->bv_page)
^
fs/io_uring.c:1643:4: error: implicit declaration of function ‘kunmap’
[-Werror=implicit-function-declaration]
kunmap(iter->bvec->bv_page);
^

which is caused by a missing highmem.h include. Fix it by including
it.

Fixes: 311ae9e159d8 ("io_uring: fix dead-hung for non-iter fixed rw")
Reported-by: Christophe Leroy
Tested-by: Christophe Leroy
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jens Axboe
2019-12-13 15:42:32 +0800
57aabff8c io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTR ... Browse Code »

commit 441cdbd5449b4923cd413d3ba748124f91388be9 upstream.

We should never return -ERESTARTSYS to userspace, transform it into
-EINTR.

Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Jens Axboe
2019-12-13 15:42:28 +0800
f246eedba io_uring: fix dead-hung for non-iter fixed rw ... Browse Code »

commit 311ae9e159d81a1ec1cf645daf40b39ae5a0bd84 upstream.

Read/write requests to devices without implemented read/write_iter
using fixed buffers can cause general protection fault, which totally
hangs a machine.

io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter()
accesses it as iovec, dereferencing random address.

kmap() page by page in this case

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe
Signed-off-by: Greg Kroah-Hartman

Pavel Begunkov
2019-12-13 15:42:27 +0800

05 Dec, 2019

1 commit

8387e3688 io_uring: async workers should inherit the user creds ... Browse Code »

[ Upstream commit 181e448d8709e517c9c7b523fcd209f24eb38ca7 ]

If we don't inherit the original task creds, then we can confuse users
like fuse that pass creds in the request header. See link below on
identical aio issue.

Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#u
Signed-off-by: Jens Axboe
Signed-off-by: Sasha Levin

Jens Axboe
2019-12-05 05:30:42 +0800

14 Nov, 2019

2 commits

5e559561a io_uring: ensure registered buffer import returns the IO length ... Browse Code »

A test case was reported where two linked reads with registered buffers
failed the second link always. This is because we set the expected value
of a request in req->result, and if we don't get this result, then we
fail the dependent links. For some reason the registered buffer import
returned -ERROR/0, while the normal import returns -ERROR/length. This
broke linked commands with registered buffers.

Fix this by making io_import_fixed() correctly return the mapped length.

Cc: stable@vger.kernel.org # v5.3
Reported-by: 李通洲
Signed-off-by: Jens Axboe

Jens Axboe
2019-11-14 07:15:14 +0800
5683e5406 io_uring: Fix getting file for timeout ... Browse Code »

For timeout requests io_uring tries to grab a file with specified fd,
which is usually stdin/fd=0.
Update io_op_needs_file()

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2019-11-14 06:25:57 +0800

12 Nov, 2019

1 commit

93bd25bb6 io_uring: make timeout sequence == 0 mean no sequence ... Browse Code »

Currently we make sequence == 0 be the same as sequence == 1, but that's
not super useful if the intent is really to have a timeout that's just
a pure timeout.

If the user passes in sqe->off == 0, then don't apply any sequence logic
to the request, let it purely be driven by the timeout specified.

Reported-by: 李通洲
Reviewed-by: 李通洲
Signed-off-by: Jens Axboe

Jens Axboe
2019-11-12 15:18:51 +0800

31 Oct, 2019

1 commit

6873e0bd6 io_uring: ensure we clear io_kiocb->result before each issue ... Browse Code »

We use io_kiocb->result == -EAGAIN as a way to know if we need to
re-submit a polled request, as -EAGAIN reporting happens out-of-line
for IO submission failures. This field is cleared when we originally
allocate the request, but it isn't reset when we retry the submission
from async context. This can cause issues where we think something
needs a re-issue, but we're really just reading stale data.

Reset ->result whenever we re-prep a request for polled submission.

Cc: stable@vger.kernel.org
Fixes: 9e645e1105ca ("io_uring: add support for sqe links")
Reported-by: Bijan Mottahedeh
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-31 04:45:22 +0800

28 Oct, 2019

2 commits

044c1ab39 io_uring: don't touch ctx in setup after ring fd install ... Browse Code »

syzkaller reported an issue where it looks like a malicious app can
trigger a use-after-free of reading the ctx ->sq_array and ->rings
value right after having installed the ring fd in the process file
table.

Defer ring fd installation until after we're done reading those
values.

Fixes: 75b28affdd6a ("io_uring: allocate the two rings together")
Reported-by: syzbot+6f03d895a6cd0d06187f@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-28 23:15:33 +0800
7b20238d2 io_uring: Fix leaked shadow_req ... Browse Code »

io_queue_link_head() owns shadow_req after taking it as an argument.
By not freeing it in case of an error, it can leak the request along
with taken ctx->refs.

Reviewed-by: Jackie Liu
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2019-10-28 11:29:18 +0800

26 Oct, 2019

2 commits

2b2ed9750 io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD ... Browse Code »

We currently assume that submissions from the sqthread are successful,
and if IO polling is enabled, we use that value for knowing how many
completions to look for. But if we overflowed the CQ ring or some
requests simply got errored and already completed, they won't be
available for polling.

For the case of IO polling and SQTHREAD usage, look at the pending
poll list. If it ever hits empty then we know that we don't have
anymore pollable requests inflight. For that case, simply reset
the inflight count to zero.

Reported-by: Pavel Begunkov
Reviewed-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-26 00:58:53 +0800
498ccd9ed io_uring: used cached copies of sq->dropped and cq->overflow ... Browse Code »

We currently use the ring values directly, but that can lead to issues
if the application is malicious and changes these values on our behalf.
Created in-kernel cached versions of them, and just overwrite the user
side when we update them. This is similar to how we treat the sq/cq
ring tail/head updates.

Reported-by: Pavel Begunkov
Reviewed-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-26 00:58:45 +0800

25 Oct, 2019

3 commits

935d1e459 io_uring: Fix race for sqes with userspace ... Browse Code »

io_ring_submit() finalises with
1. io_commit_sqring(), which releases sqes to the userspace
2. Then calls to io_queue_link_head(), accessing released head's sqe

Reorder them.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2019-10-25 23:02:01 +0800
fb5ccc987 io_uring: Fix broken links with offloading ... Browse Code »

io_sq_thread() processes sqes by 8 without considering links. As a
result, links will be randomely subdivided.

The easiest way to fix it is to call io_get_sqring() inside
io_submit_sqes() as do io_ring_submit().

Downsides:
1. This removes optimisation of not grabbing mm_struct for fixed files
2. It submitting all sqes in one go, without finer-grained sheduling
with cq processing.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2019-10-25 23:01:59 +0800
84d55dc5b io_uring: Fix corrupted user_data ... Browse Code »

There is a bug, where failed linked requests are returned not with
specified @user_data, but with garbage from a kernel stack.

The reason is that io_fail_links() uses req->user_data, which is
uninitialised when called from io_queue_sqe() on fail path.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2019-10-25 23:01:58 +0800

24 Oct, 2019

3 commits

a1f58ba46 io_uring: correct timeout req sequence when inserting a new entry ... Browse Code »

The sequence number of the timeout req (req->sequence) indicate the
expected completion request. Because of each timeout req consume a
sequence number, so the sequence of each timeout req on the timeout
list shouldn't be the same. But now, we may get the same number (also
incorrect) if we insert a new entry before the last one, such as submit
such two timeout reqs on a new ring instance below.

req->sequence
req_1 (count = 2): 2
req_2 (count = 1): 2

Then, if we submit a nop req, req_2 will still timeout even the nop req
finished. This patch fix this problem by adjust the sequence number of
each reordered reqs when inserting a new entry.

Signed-off-by: zhangyi (F)
Signed-off-by: Jens Axboe

zhangyi (F)
2019-10-24 12:09:56 +0800
ef03681ae io_uring : correct timeout req sequence when waiting timeout ... Browse Code »

The sequence number of reqs on the timeout_list before the timeout req
should be adjusted in io_timeout_fn(), because the current timeout req
will consumes a slot in the cq_ring and cq_tail pointer will be
increased, otherwise other timeout reqs may return in advance without
waiting for enough wait_nr.

Signed-off-by: zhangyi (F)
Signed-off-by: Jens Axboe

zhangyi (F)
2019-10-24 12:09:56 +0800
bc808bced io_uring: revert "io_uring: optimize submit_and_wait API" ... Browse Code »

There are cases where it isn't always safe to block for submission,
even if the caller asked to wait for events as well. Revert the
previous optimization of doing that.

This reverts two commits:

bf7ec93c644cb
c576666863b78

Fixes: c576666863b78 ("io_uring: optimize submit_and_wait API")
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-24 12:09:56 +0800

19 Oct, 2019

1 commit

d418d0700 Merge tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- NVMe pull request from Keith that address deadlocks, double resets,
memory leaks, and other regression.

- Fixup elv_support_iosched() for bio based devices (Damien)

- Fixup for the ahci PCS quirk (Dan)

- Socket O_NONBLOCK handling fix for io_uring (me)

- Timeout sequence io_uring fixes (yangerkun)

- MD warning fix for parameter default_layout (Song)

- blkcg activation fixes (Tejun)

- blk-rq-qos node deletion fix (Tejun)

* tag 'for-linus-2019-10-18' of git://git.kernel.dk/linux-block:
nvme-pci: Set the prp2 correctly when using more than 4k page
io_uring: fix logic error in io_timeout
io_uring: fix up O_NONBLOCK handling for sockets
md/raid0: fix warning message for parameter default_layout
libata/ahci: Fix PCS quirk application
blk-rq-qos: fix first node deletion of rq_qos_del()
blkcg: Fix multiple bugs in blkcg_activate_policy()
io_uring: consider the overflow of sequence for timeout req
nvme-tcp: fix possible leakage during error flow
nvmet-loop: fix possible leakage during error flow
block: Fix elv_support_iosched()
nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLL
nvme: Wait for reset state when required
nvme: Prevent resets during paused controller state
nvme: Restart request timers in resetting state
nvme: Remove ADMIN_ONLY state
nvme-pci: Free tagset if no IO queues
nvme: retain split access workaround for capability reads
nvme: fix possible deadlock when nvme_update_formats fails

Linus Torvalds
2019-10-19 10:29:36 +0800

18 Oct, 2019

2 commits

8b07a65ad io_uring: fix logic error in io_timeout ... Browse Code »

If ctx->cached_sq_head < nxt_sq_head, we should add UINT_MAX to tmp, not
tmp_nxt.

Fixes: 5da0fb1ab34c ("io_uring: consider the overflow of sequence for timeout req")
Signed-off-by: yangerkun
Signed-off-by: Jens Axboe

yangerkun
2019-10-18 05:49:15 +0800
491381ce0 io_uring: fix up O_NONBLOCK handling for sockets ... Browse Code »

We've got two issues with the non-regular file handling for non-blocking
IO:

1) We don't want to re-do a short read in full for a non-regular file,
as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
we need to punt to async context even if the file is opened as
non-blocking. Otherwise the caller always gets -EAGAIN.

Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.

Cc: stable@vger.kernel.org
Reported-by: Hrvoje Zeba
Tested-by: Hrvoje Zeba
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-18 05:49:11 +0800

15 Oct, 2019

1 commit

5da0fb1ab io_uring: consider the overflow of sequence for timeout req ... Browse Code »

Now we recalculate the sequence of timeout with 'req->sequence =
ctx->cached_sq_head + count - 1', judge the right place to insert
for timeout_list by compare the number of request we still expected for
completion. But we have not consider about the situation of overflow:

1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for
the new timeout req can have a small req->sequence.

2. cached_sq_head of now may overflow compare with before req. And it
will lead the timeout req with small req->sequence.

This overflow will lead to the misorder of timeout_list, which can lead
to the wrong order of the completion of timeout_list. Fix it by reuse
req->submit.sequence to store the count, and change the logic of
inserting sort in io_timeout.

Signed-off-by: yangerkun
Signed-off-by: Jens Axboe

yangerkun
2019-10-15 22:55:50 +0800

13 Oct, 2019

1 commit

b27528b02 Merge tag 'for-linus-20191012' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring fix from Jens Axboe:
"Single small fix for a regression in the sequence logic for linked
commands"

* tag 'for-linus-20191012' of git://git.kernel.dk/linux-block:
io_uring: fix sequence logic for timeout requests

Linus Torvalds
2019-10-13 23:15:35 +0800

11 Oct, 2019

2 commits

297cbcccc Merge tag 'for-linus-20191010' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- Fix wbt performance regression introduced with the blk-rq-qos
refactoring (Harshad)

- Fix io_uring fileset removal inadvertently killing the workqueue (me)

- Fix io_uring typo in linked command nonblock submission (Pavel)

- Remove spurious io_uring wakeups on request free (Pavel)

- Fix null_blk zoned command error return (Keith)

- Don't use freezable workqueues for backing_dev, also means we can
revert a previous libata hack (Mika)

- Fix nbd sysfs mutex dropped too soon at removal time (Xiubo)

* tag 'for-linus-20191010' of git://git.kernel.dk/linux-block:
nbd: fix possible sysfs duplicate warning
null_blk: Fix zoned command return code
io_uring: only flush workqueues on fileset removal
io_uring: remove wait loop spurious wakeups
blk-wbt: fix performance regression in wbt scale_up/scale_down
Revert "libata, freezer: avoid block device removal while system is frozen"
bdi: Do not use freezable workqueue
io_uring: fix reversed nonblock flag for link submission

Linus Torvalds
2019-10-11 23:45:32 +0800
7adf4eaf6 io_uring: fix sequence logic for timeout requests ... Browse Code »

We have two ways a request can be deferred:

1) It's a regular request that depends on another one
2) It's a timeout that tracks completions

We have a shared helper to determine whether to defer, and that
attempts to make the right decision based on the request. But we
only have some of this information in the caller. Un-share the
two timeout/defer helpers so the caller can use the right one.

Fixes: 5262f567987d ("io_uring: IORING_OP_TIMEOUT support")
Reported-by: yangerkun
Reviewed-by: Jackie Liu
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-11 11:42:58 +0800

10 Oct, 2019

1 commit

8a9973408 io_uring: only flush workqueues on fileset removal ... Browse Code »

We should not remove the workqueue, we just need to ensure that the
workqueues are synced. The workqueues are torn down on ctx removal.

Cc: stable@vger.kernel.org
Fixes: 6b06314c47e1 ("io_uring: add file set registration")
Reported-by: Stefan Hajnoczi
Signed-off-by: Jens Axboe

Jens Axboe
2019-10-10 05:13:47 +0800

08 Oct, 2019

1 commit

6805b32ec io_uring: remove wait loop spurious wakeups ... Browse Code »

Any changes interesting to tasks waiting in io_cqring_wait() are
commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
also tries to do that but with no reason, that means spurious wakeups
every io_free_req() and io_uring_enter().

Just use percpu_ref_put() instead.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2019-10-08 11:16:24 +0800

05 Oct, 2019

1 commit

c4bd70e8c Merge tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block fixes from Jens Axboe:

- Mandate timespec64 for the io_uring timeout ABI (Arnd)

- Set of NVMe changes via Sagi:
- controller removal race fix from Balbir
- quirk additions from Gabriel and Jian-Hong
- nvme-pci power state save fix from Mario
- Add 64bit user commands (for 64bit registers) from Marta
- nvme-rdma/nvme-tcp fixes from Max, Mark and Me
- Minor cleanups and nits from James, Dan and John

- Two s390 dasd fixes (Jan, Stefan)

- Have loop change block size in DIO mode (Martijn)

- paride pg header ifdef guard (Masahiro)

- Two blk-mq queue scheduler tweaks, fixing an ordering issue on zoned
devices and suboptimal performance on others (Ming)

* tag 'for-linus-2019-10-03' of git://git.kernel.dk/linux-block: (22 commits)
block: sed-opal: fix sparse warning: convert __be64 data
block: sed-opal: fix sparse warning: obsolete array init.
block: pg: add header include guard
Revert "s390/dasd: Add discard support for ESE volumes"
s390/dasd: Fix error handling during online processing
io_uring: use __kernel_timespec in timeout ABI
loop: change queue block size to match when using DIO
blk-mq: apply normal plugging for HDD
blk-mq: honor IO scheduler for multiqueue devices
nvme-rdma: fix possible use-after-free in connect timeout
nvme: Move ctrl sqsize to generic space
nvme: Add ctrl attributes for queue_count and sqsize
nvme: allow 64-bit results in passthru commands
nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
nvmet-tcp: remove superflous check on request sgl
Added QUIRKs for ADATA XPG SX8200 Pro 512GB
nvme-rdma: Fix max_hw_sectors calculation
nvme: fix an error code in nvme_init_subsystem()
nvme-pci: Save PCI state before putting drive into deepest state
nvme-tcp: fix wrong stop condition in io_work
...

Linus Torvalds
2019-10-05 00:56:51 +0800

04 Oct, 2019

1 commit

bf7ec93c6 io_uring: fix reversed nonblock flag for link submission ... Browse Code »

io_queue_link_head() accepts @force_nonblock flag, but io_ring_submit()
passes something opposite.

Fixes: c576666863b78 ("io_uring: optimize submit_and_wait API")
Reported-by: kbuild test robot
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2019-10-04 22:31:15 +0800

01 Oct, 2019

1 commit

bdf200731 io_uring: use __kernel_timespec in timeout ABI ... Browse Code »

All system calls use struct __kernel_timespec instead of the old struct
timespec, but this one was just added with the old-style ABI. Change it
now to enforce the use of __kernel_timespec, avoiding ABI confusion and
the need for compat handlers on 32-bit architectures.

Any user space caller will have to use __kernel_timespec now, but this
is unambiguous and works for any C library regardless of the time_t
definition. A nicer way to specify the timeout would have been a less
ambiguous 64-bit nanosecond value, but I suppose it's too late now to
change that as this would impact both 32-bit and 64-bit users.

Fixes: 5262f567987d ("io_uring: IORING_OP_TIMEOUT support")
Signed-off-by: Arnd Bergmann
Signed-off-by: Jens Axboe

Arnd Bergmann
2019-10-01 23:53:29 +0800

28 Sep, 2019

1 commit

738f531d8 Merge tag 'for-5.4/io_uring-2019-09-27' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more io_uring updates from Jens Axboe:
"Just two things in here:

- Improvement to the io_uring CQ ring wakeup for batched IO (me)

- Fix wrong comparison in poll handling (yangerkun)

I realize the first one is a little late in the game, but it felt
pointless to hold it off until the next release. Went through various
testing and reviews with Pavel and peterz"

* tag 'for-5.4/io_uring-2019-09-27' of git://git.kernel.dk/linux-block:
io_uring: make CQ ring wakeups be more efficient
io_uring: compare cached_cq_tail with cq.head in_io_uring_poll

Linus Torvalds
2019-09-28 03:08:24 +0800

26 Sep, 2019

1 commit

bda521624 io_uring: make CQ ring wakeups be more efficient ... Browse Code »

For batched IO, it's not uncommon for waiters to ask for more than 1
IO to complete before being woken up. This is a problem with
wait_event() since tasks will get woken for every IO that completes,
re-check condition, then go back to sleep. For batch counts on the
order of what you do for high IOPS, that can result in 10s of extra
wakeups for the waiting task.

Add a private wake function that checks for the wake up count criteria
being met before calling autoremove_wake_function(). Pavel reports that
one test case he has runs 40% faster with proper batching of wakeups.

Reported-by: Pavel Begunkov
Tested-by: Pavel Begunkov
Reviewed-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Jens Axboe
2019-09-26 17:55:40 +0800

25 Sep, 2019

2 commits

b6cb84b4f Merge tag 'for-5.4/io_uring-2019-09-24' of git://git.kernel.dk/linux-block ... Browse Code »

Pull more io_uring updates from Jens Axboe:
"A collection of later fixes and additions, that weren't quite ready
for pushing out with the initial pull request.

This contains:

- Fix potential use-after-free of shadow requests (Jackie)

- Fix potential OOM crash in request allocation (Jackie)

- kmalloc+memcpy -> kmemdup cleanup (Jackie)

- Fix poll crash regression (me)

- Fix SQ thread not being nice and giving up CPU for !PREEMPT (me)

- Add support for timeouts, making it easier to do epoll_wait()
conversions, for instance (me)

- Ensure io_uring works without f_ops->read_iter() and
f_ops->write_iter() (me)"

* tag 'for-5.4/io_uring-2019-09-24' of git://git.kernel.dk/linux-block:
io_uring: correctly handle non ->{read,write}_iter() file_operations
io_uring: IORING_OP_TIMEOUT support
io_uring: use cond_resched() in sqthread
io_uring: fix potential crash issue due to io_get_req failure
io_uring: ensure poll commands clear ->sqe
io_uring: fix use-after-free of shadow_req
io_uring: use kmemdup instead of kmalloc and memcpy

Linus Torvalds
2019-09-25 07:40:21 +0800
a50b854e0 mm: introduce page_size() ... Browse Code »

Patch series "Make working with compound pages easier", v2.

These three patches add three helpers and convert the appropriate
places to use them.

This patch (of 3):

It's unnecessarily hard to find out the size of a potentially huge page.
Replace 'PAGE_SIZE << compound_order(page)' with page_size(page).

Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle)
Acked-by: Michal Hocko
Reviewed-by: Andrew Morton
Reviewed-by: Ira Weiny
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox (Oracle)
2019-09-25 06:54:08 +0800

24 Sep, 2019

2 commits

daa5de541 io_uring: compare cached_cq_tail with cq.head in_io_uring_poll ... Browse Code »

After 75b28af("io_uring: allocate the two rings together"), we compare
sq.head with cached_cq_tail to determine does there any cq invalid.
Actually, we should use cq.head.

Fixes: 75b28affdd6a ("io_uring: allocate the two rings together")
Signed-off-by: yangerkun
Signed-off-by: Jens Axboe

yangerkun
2019-09-24 21:04:15 +0800
32960613b io_uring: correctly handle non ->{read,write}_iter() file_operations ... Browse Code »

Currently we just -EINVAL a read or write to an fd that isn't backed
by ->read_iter() or ->write_iter(). But we can handle them just fine,
as long as we punt fo async context first.

Implement a simple loop function for doing ->read() or ->write()
instead, and ensure we call it appropriately.

Reported-by: 李通洲
Signed-off-by: Jens Axboe

Jens Axboe
2019-09-24 01:05:34 +0800

19 Sep, 2019

3 commits

5262f5679 io_uring: IORING_OP_TIMEOUT support ... Browse Code »

There's been a few requests for functionality similar to io_getevents()
and epoll_wait(), where the user can specify a timeout for waiting on
events. I deliberately did not add support for this through the system
call initially to avoid overloading the args, but I can see that the use
cases for this are valid.

This adds support for IORING_OP_TIMEOUT. If a user wants to get woken
when waiting for events, simply submit one of these timeout commands
with your wait call (or before). This ensures that the application
sleeping on the CQ ring waiting for events will get woken. The timeout
command is passed in as a pointer to a struct timespec. Timeouts are
relative. The timeout command also includes a way to auto-cancel after
N events has passed.

Signed-off-by: Jens Axboe

Jens Axboe
2019-09-19 00:43:22 +0800
9831a90ce io_uring: use cond_resched() in sqthread ... Browse Code »

If preempt isn't enabled in the kernel, we can run into hang issues with
sqthread submissions. Use cond_resched() to play nice instead of
cpu_relax(), if we end up starting the loop and not having any events
pending for submissions.

Signed-off-by: Jens Axboe

Jens Axboe
2019-09-19 23:49:26 +0800
a1041c27b io_uring: fix potential crash issue due to io_get_req failure ... Browse Code »

Sometimes io_get_req will return a NUL, then we need to do the
correct error handling, otherwise it will cause the kernel null
pointer exception.

Fixes: 4fe2c963154c ("io_uring: add support for link with drain")
Signed-off-by: Jackie Liu
Signed-off-by: Jens Axboe

Jackie Liu
2019-09-19 01:20:04 +0800