Eric Lee / smarc-fsl-linux-kernel

05 Nov, 2020

1 commit

3dd1680d1 io-wq: cancel request if it's asking for files and we don't have them ... Browse Code »

This can't currently happen, but will be possible shortly. Handle missing
files just like we do not being able to grab a needed mm, and mark the
request as needing cancelation.

Signed-off-by: Jens Axboe

Jens Axboe
2020-11-05 01:22:56 +0800

22 Oct, 2020

1 commit

43c01fbef io-wq: re-set NUMA node affinities if CPUs come online ... Browse Code »

We correctly set io-wq NUMA node affinities when the io-wq context is
setup, but if an entire node CPU set is offlined and then brought back
online, the per node affinities are broken. Ensure that we set them
again whenever a CPU comes online. This ensures that we always track
the right node affinity. The usual cpuhp notifiers are used to drive it.

Reported-by: Zhang Qiang
Signed-off-by: Jens Axboe

Jens Axboe
2020-10-22 23:02:50 +0800

21 Oct, 2020

1 commit

69228338c io_uring: unify fsize with def->work_flags ... Browse Code »

This one was missed in the earlier conversion, should be included like
any of the other IO identity flags. Make sure we restore to RLIM_INIFITY
when dropping the personality again.

Fixes: 98447d65b4a7 ("io_uring: move io identity items into separate struct")
Signed-off-by: Jens Axboe

Jens Axboe
2020-10-21 06:03:13 +0800

17 Oct, 2020

5 commits

4ea33a976 io-wq: inherit audit loginuid and sessionid ... Browse Code »

Make sure the async io-wq workers inherit the loginuid and sessionid from
the original task, and restore them to unset once we're done with the
async work item.

While at it, disable the ability for kernel threads to write to their own
loginuid.

Signed-off-by: Jens Axboe

Jens Axboe
2020-10-17 23:25:47 +0800
98447d65b io_uring: move io identity items into separate struct ... Browse Code »

io-wq contains a pointer to the identity, which we just hold in io_kiocb
for now. This is in preparation for putting this outside io_kiocb. The
only exception is struct files_struct, which we'll need different rules
for to avoid a circular dependency.

No functional changes in this patch.

Signed-off-by: Jens Axboe

Jens Axboe
2020-10-17 23:25:45 +0800
dfead8a8e io_uring: rely solely on work flags to determine personality. ... Browse Code »

We solely rely on work->work_flags now, so use that for proper checking
and clearing/dropping of various identity items.

Signed-off-by: Jens Axboe

Jens Axboe
2020-10-17 23:25:45 +0800
0f2037658 io_uring: pass required context in as flags ... Browse Code »

We have a number of bits that decide what context to inherit. Set up
io-wq flags for these instead. This is in preparation for always having
the various members set, but not always needing them for all requests.

No intended functional changes in this patch.

Signed-off-by: Jens Axboe

Jens Axboe
2020-10-17 23:25:45 +0800
a8b595b22 io-wq: assign NUMA node locality if appropriate ... Browse Code »

There was an assumption that kthread_create_on_node() would properly set
NUMA affinities in terms of CPUs allowed, but it doesn't. Make sure we
do this when creating an io-wq context on NUMA.

Cc: stable@vger.kernel.org
Stefan Metzmacher
Signed-off-by: Jens Axboe

Jens Axboe
2020-10-17 23:25:44 +0800

01 Oct, 2020

5 commits

145cc8c66 io-wq: kill unused IO_WORKER_F_EXITING ... Browse Code »

This flag is no longer used, remove it.

Signed-off-by: Jens Axboe

Jens Axboe
2020-10-01 10:32:34 +0800
c4068bf89 io-wq: fix use-after-free in io_wq_worker_running ... Browse Code »

The smart syzbot has found a reproducer for the following issue:

==================================================================
BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
BUG: KASAN: use-after-free in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
BUG: KASAN: use-after-free in io_wqe_inc_running fs/io-wq.c:301 [inline]
BUG: KASAN: use-after-free in io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
Write of size 4 at addr ffff8882183db08c by task io_wqe_worker-0/7771

CPU: 0 PID: 7771 Comm: io_wqe_worker-0 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
instrument_atomic_write include/linux/instrumented.h:71 [inline]
atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
io_wqe_inc_running fs/io-wq.c:301 [inline]
io_wq_worker_running+0xde/0x110 fs/io-wq.c:613
schedule_timeout+0x148/0x250 kernel/time/timer.c:1879
io_wqe_worker+0x517/0x10e0 fs/io-wq.c:580
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 7768:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
kmem_cache_alloc_node_trace+0x17b/0x3f0 mm/slab.c:3594
kmalloc_node include/linux/slab.h:572 [inline]
kzalloc_node include/linux/slab.h:677 [inline]
io_wq_create+0x57b/0xa10 fs/io-wq.c:1064
io_init_wq_offload fs/io_uring.c:7432 [inline]
io_sq_offload_start fs/io_uring.c:7504 [inline]
io_uring_create fs/io_uring.c:8625 [inline]
io_uring_setup+0x1836/0x28e0 fs/io_uring.c:8694
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 21:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kfree+0x10e/0x2b0 mm/slab.c:3756
__io_wq_destroy fs/io-wq.c:1138 [inline]
io_wq_destroy+0x2af/0x460 fs/io-wq.c:1146
io_finish_async fs/io_uring.c:6836 [inline]
io_ring_ctx_free fs/io_uring.c:7870 [inline]
io_ring_exit_work+0x1e4/0x6d0 fs/io_uring.c:7954
process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

The buggy address belongs to the object at ffff8882183db000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 140 bytes inside of
1024-byte region [ffff8882183db000, ffff8882183db400)
The buggy address belongs to the page:
page:000000009bada22b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2183db
flags: 0x57ffe0000000200(slab)
raw: 057ffe0000000200 ffffea0008604c48 ffffea00086a8648 ffff8880aa040700
raw: 0000000000000000 ffff8882183db000 0000000100000002 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8882183daf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff8882183db000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8882183db080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8882183db100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8882183db180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

which is down to the comment below,

/* all workers gone, wq exit can proceed */
if (!nr_workers && refcount_dec_and_test(&wqe->wq->refs))
complete(&wqe->wq->done);

because there might be multiple cases of wqe in a wq and we would wait
for every worker in every wqe to go home before releasing wq's resources
on destroying.

To that end, rework wq's refcount by making it independent of the tracking
of workers because after all they are two different things, and keeping
it balanced when workers come and go. Note the manager kthread, like
other workers, now holds a grab to wq during its lifetime.

Finally to help destroy wq, check IO_WQ_BIT_EXIT upon creating worker
and do nothing for exiting wq.

Cc: stable@vger.kernel.org # v5.5+
Reported-by: syzbot+45fa0a195b941764e0f0@syzkaller.appspotmail.com
Reported-by: syzbot+9af99580130003da82b1@syzkaller.appspotmail.com
Cc: Pavel Begunkov
Signed-off-by: Hillf Danton
Signed-off-by: Jens Axboe

Hillf Danton
2020-10-01 10:32:34 +0800
91d8f5191 io_uring: add blkcg accounting to offloaded operations ... Browse Code »

There are a few operations that are offloaded to the worker threads. In
this case, we lose process context and end up in kthread context. This
results in ios to be not accounted to the issuing cgroup and
consequently end up as issued by root. Just like others, adopt the
personality of the blkcg too when issuing via the workqueues.

For the SQPOLL thread, it will live and attach in the inited cgroup's
context.

Signed-off-by: Dennis Zhou
Signed-off-by: Jens Axboe

Dennis Zhou
2020-10-01 10:32:34 +0800
95da84659 io_wq: Make io_wqe::lock a raw_spinlock_t ... Browse Code »

During a context switch the scheduler invokes wq_worker_sleeping() with
disabled preemption. Disabling preemption is needed because it protects
access to `worker->sleeping'. As an optimisation it avoids invoking
schedule() within the schedule path as part of possible wake up (thus
preempt_enable_no_resched() afterwards).

The io-wq has been added to the mix in the same section with disabled
preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping()
acquires a spinlock_t. Also within the schedule() the spinlock_t must be
acquired after tsk_is_pi_blocked() otherwise it will block on the
sleeping lock again while scheduling out.

While playing with `io_uring-bench' I didn't notice a significant
latency spike after converting io_wqe::lock to a raw_spinlock_t. The
latency was more or less the same.

In order to keep the spinlock_t it would have to be moved after the
tsk_is_pi_blocked() check which would introduce a branch instruction
into the hot path.

The lock is used to maintain the `work_list' and wakes one task up at
most.
Should io_wqe_cancel_pending_work() cause latency spikes, while
searching for a specific item, then it would need to drop the lock
during iterations.
revert_creds() is also invoked under the lock. According to debug
cred::non_rcu is 0. Otherwise it should be moved outside of the locked
section because put_cred_rcu()->free_uid() acquires a sleeping lock.

Convert io_wqe::lock to a raw_spinlock_t.c

Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Jens Axboe

Sebastian Andrzej Siewior
2020-10-01 10:32:33 +0800
9b8284921 io_uring: reference ->nsproxy for file table commands ... Browse Code »

If we don't get and assign the namespace for the async work, then certain
paths just don't work properly (like /dev/stdin, /proc/mounts, etc).
Anything that references the current namespace of the given task should
be assigned for async work on behalf of that task.

Cc: stable@vger.kernel.org # v5.5+
Reported-by: Al Viro
Signed-off-by: Jens Axboe

Jens Axboe
2020-10-01 10:32:32 +0800

24 Aug, 2020

1 commit

204361a77 io-wq: fix hang after cancelling pending hashed work ... Browse Code »

Don't forget to update wqe->hash_tail after cancelling a pending work
item, if it was hashed.

Cc: stable@vger.kernel.org # 5.7+
Reported-by: Dmitry Shulyak
Fixes: 86f3cd1b589a1 ("io-wq: handle hashed writes in chains")
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-08-24 01:38:50 +0800

25 Jul, 2020

2 commits

b089ed390 io-wq: update hash bits ... Browse Code »

Linked requests are hashed, remove a comment stating otherwise. Also
move hash bits to emphasise that we don't carry it through loop
iteration and set it every time.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-07-25 23:47:44 +0800
57f1a6495 io_uring/io-wq: move RLIMIT_FSIZE to io-wq ... Browse Code »

RLIMIT_SIZE in needed only for execution from an io-wq context, hence
move all preparations from hot path to io-wq work setup.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-07-25 03:00:44 +0800

27 Jun, 2020

1 commit

f4db7182e io-wq: return next work from ->do_work() directly ... Browse Code »

It's easier to return next work from ->do_work() than
having an in-out argument. Looks nicer and easier to compile.
Also, merge io_wq_assign_next() into its only user.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-06-27 00:34:27 +0800

15 Jun, 2020

3 commits

44e728b8a io_uring: cancel all task's requests on exit ... Browse Code »

If a process is going away, io_uring_flush() will cancel only 1
request with a matching pid. Cancel all of them

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-06-15 22:51:34 +0800
4f26bda15 io-wq: add an option to cancel all matched reqs ... Browse Code »

This adds support for cancelling all io-wq works matching a predicate.
It isn't used yet, so no change in observable behaviour.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-06-15 22:51:34 +0800
f4c2665e3 io-wq: reorder cancellation pending -> running ... Browse Code »

Go all over all pending lists and cancel works there, and only then
try to match running requests. No functional changes here, just a
preparation for bulk cancellation.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-06-15 22:51:33 +0800

12 Jun, 2020

1 commit

b961f8dc8 Merge tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring fixes from Jens Axboe:
"A few late stragglers in here. In particular:

- Validate full range for provided buffers (Bijan)

- Fix bad use of kfree() in buffer registration failure (Denis)

- Don't allow close of ring itself, it's not fully safe. Making it
fully safe would require making the system call more expensive,
which isn't worth it.

- Buffer selection fix

- Regression fix for O_NONBLOCK retry

- Make IORING_OP_ACCEPT honor O_NONBLOCK (Jiufei)

- Restrict opcode handling for SQ/IOPOLL (Pavel)

- io-wq work handling cleanups and improvements (Pavel, Xiaoguang)

- IOPOLL race fix (Xiaoguang)"

* tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-block:
io_uring: fix io_kiocb.flags modification race in IOPOLL mode
io_uring: check file O_NONBLOCK state for accept
io_uring: avoid unnecessary io_wq_work copy for fast poll feature
io_uring: avoid whole io_wq_work copy for requests completed inline
io_uring: allow O_NONBLOCK async retry
io_wq: add per-wq work handler instead of per work
io_uring: don't arm a timeout through work.func
io_uring: remove custom ->func handlers
io_uring: don't derive close state from ->func
io_uring: use kvfree() in io_sqe_buffer_register()
io_uring: validate the full range of provided buffers for access
io_uring: re-set iov base/len for buffer select retry
io_uring: move send/recv IOPOLL check into prep
io_uring: deduplicate io_openat{,2}_prep()
io_uring: do build_open_how() only once
io_uring: fix {SQ,IO}POLL with unsupported opcodes
io_uring: disallow close of ring itself

Linus Torvalds
2020-06-12 07:10:08 +0800

11 Jun, 2020

3 commits

37c54f9bd kernel: set USER_DS in kthread_use_mm ... Browse Code »

Some architectures like arm64 and s390 require USER_DS to be set for
kernel threads to access user address space, which is the whole purpose of
kthread_use_mm, but other like x86 don't. That has lead to a huge mess
where some callers are fixed up once they are tested on said
architectures, while others linger around and yet other like io_uring try
to do "clever" optimizations for what usually is just a trivial asignment
to a member in the thread_struct for most architectures.

Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the
previous value instead.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Tested-by: Jens Axboe
Reviewed-by: Jens Axboe
Acked-by: Michael S. Tsirkin
Cc: Alex Deucher
Cc: Al Viro
Cc: Felipe Balbi
Cc: Felix Kuehling
Cc: Jason Wang
Cc: Zhenyu Wang
Cc: Zhi Wang
Cc: Greg Kroah-Hartman
Link: http://lkml.kernel.org/r/20200404094101.672954-7-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-11 10:14:18 +0800
f5678e7f2 kernel: better document the use_mm/unuse_mm API contract ... Browse Code »

Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

Signed-off-by: Christoph Hellwig
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Tested-by: Jens Axboe
Reviewed-by: Jens Axboe
Acked-by: Felix Kuehling
Acked-by: Greg Kroah-Hartman [usb]
Acked-by: Haren Myneni
Cc: Alex Deucher
Cc: Al Viro
Cc: Felipe Balbi
Cc: Jason Wang
Cc: "Michael S. Tsirkin"
Cc: Zhenyu Wang
Cc: Zhi Wang
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-11 10:14:18 +0800
9bf5b9eb2 kernel: move use_mm/unuse_mm to kthread.c ... Browse Code »

Patch series "improve use_mm / unuse_mm", v2.

This series improves the use_mm / unuse_mm interface by better documenting
the assumptions, and my taking the set_fs manipulations spread over the
callers into the core API.

This patch (of 3):

Use the proper API instead.

Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de

These helpers are only for use with kernel threads, and I will tie them
more into the kthread infrastructure going forward. Also move the
prototypes to kthread.h - mmu_context.h was a little weird to start with
as it otherwise contains very low-level MM bits.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Tested-by: Jens Axboe
Reviewed-by: Jens Axboe
Acked-by: Felix Kuehling
Cc: Alex Deucher
Cc: Al Viro
Cc: Felipe Balbi
Cc: Jason Wang
Cc: "Michael S. Tsirkin"
Cc: Zhenyu Wang
Cc: Zhi Wang
Cc: Greg Kroah-Hartman
Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-11 10:14:18 +0800

09 Jun, 2020

1 commit

f5fa38c59 io_wq: add per-wq work handler instead of per work ... Browse Code »

io_uring is the only user of io-wq, and now it uses only io-wq callback
for all its requests, namely io_wq_submit_work(). Instead of storing
work->runner callback in each instance of io_wq_work, keep it in io-wq
itself.

pros:
- reduces io_wq_work size
- more robust -- ->func won't be invalidated with mem{cpy,set}(req)
- helps other work

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-06-09 03:47:37 +0800

04 Apr, 2020

1 commit

aa96bf8a9 io_uring: use io-wq manager as backup task if task is exiting ... Browse Code »

If the original task is (or has) exited, then the task work will not get
queued properly. Allow for using the io-wq manager task to queue this
work for execution, and ensure that the io-wq manager notices and runs
this work if woken up (or exiting).

Reported-by: Dan Melnic
Signed-off-by: Jens Axboe

Jens Axboe
2020-04-04 01:35:57 +0800

24 Mar, 2020

1 commit

86f3cd1b5 io-wq: handle hashed writes in chains ... Browse Code »

We always punt async buffered writes to an io-wq helper, as the core
kernel does not have IOCB_NOWAIT support for that. Most buffered async
writes complete very quickly, as it's just a copy operation. This means
that doing multiple locking roundtrips on the shared wqe lock for each
buffered write is wasteful. Additionally, buffered writes are hashed
work items, which means that any buffered write to a given file is
serialized.

Keep identicaly hashed work items contiguously in @wqe->work_list, and
track a tail for each hash bucket. On dequeue of a hashed item, splice
all of the same hash in one go using the tracked tail. Until the batch
is done, the caller doesn't have to synchronize with the wqe or worker
locks again.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-24 04:58:07 +0800

23 Mar, 2020

1 commit

f2cf11492 io-wq: close cancel gap for hashed linked work ... Browse Code »

After io_assign_current_work() of a linked work, it can be decided to
offloaded to another thread so doing io_wqe_enqueue(). However, until
next io_assign_current_work() it can be cancelled, that isn't handled.

Don't assign it, if it's not going to be executed.

Fixes: 60cf46ae6054 ("io-wq: hash dependent work")
Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-23 01:33:58 +0800

15 Mar, 2020

3 commits

60cf46ae6 io-wq: hash dependent work ... Browse Code »

Enable io-wq hashing stuff for dependent works simply by re-enqueueing
such requests.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-15 07:02:30 +0800
8766dd516 io-wq: split hashing and enqueueing ... Browse Code »

It's a preparation patch removing io_wq_enqueue_hashed(), which
now should be done by io_wq_hash_work() + io_wq_enqueue().

Also, set hash value for dependant works, and do it as late as possible,
because req->file can be unavailable before. This hash will be ignored
by io-wq.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-15 07:02:28 +0800
d78298e73 io-wq: don't resched if there is no work ... Browse Code »

This little tweak restores the behaviour that was before the recent
io_worker_handle_work() optimisation patches. It makes the function do
cond_resched() and flush_signals() only if there is an actual work to
execute.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-15 07:02:26 +0800

12 Mar, 2020

1 commit

2293b4195 io-wq: remove duplicated cancel code ... Browse Code »

Deduplicate cancellation parts, as many of them looks the same, as do
e.g.
- io_wqe_cancel_cb_work() and io_wqe_cancel_work()
- io_wq_worker_cancel() and io_work_cancel()

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-12 21:50:22 +0800

05 Mar, 2020

4 commits

e9fd93965 io_uring/io-wq: forward submission ref to async ... Browse Code »

First it changes io-wq interfaces. It replaces {get,put}_work() with
free_work(), which guaranteed to be called exactly once. It also enforces
free_work() callback to be non-NULL.

io_uring follows the changes and instead of putting a submission reference
in io_put_req_async_completion(), it will be done in io_free_work(). As
removes io_get_work() with corresponding refcount_inc(), the ref balance
is maintained.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-05 02:39:07 +0800
f462fd36f io-wq: optimise out *next_work() double lock ... Browse Code »

When executing non-linked hashed work, io_worker_handle_work()
will lock-unlock wqe->lock to update hash, and then immediately
lock-unlock to get next work. Optimise this case and do
lock/unlock only once.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-05 02:39:06 +0800
58e393198 io-wq: optimise locking in io_worker_handle_work() ... Browse Code »

There are 2 optimisations:
- Now, io_worker_handler_work() do io_assign_current_work() twice per
request, and each one adds lock/unlock(worker->lock) pair. The first is
to reset worker->cur_work to NULL, and the second to set a real work
shortly after. If there is a dependant work, set it immediately, that
effectively removes the extra NULL'ing.

- And there is no use in taking wqe->lock for linked works, as they are
not hashed now. Optimise it out.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-05 02:39:04 +0800
dc026a73c io-wq: shuffle io_worker_handle_work() code ... Browse Code »

This is a preparation patch, it adds some helpers and makes
the next patches cleaner.

- extract io_impersonate_work() and io_assign_current_work()
- replace @next label with nested do-while
- move put_work() right after NULL'ing cur_work.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-05 02:39:03 +0800

03 Mar, 2020

4 commits

3684f2465 io-wq: use BIT for ulong hash ... Browse Code »

@hash_map is unsigned long, but BIT_ULL() is used for manipulations.
BIT() is a better match as it returns exactly unsigned long value.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-03 05:06:31 +0800
5eae86199 io_uring: remove IO_WQ_WORK_CB ... Browse Code »

IO_WQ_WORK_CB is used only for linked timeouts, which will be armed
before the work setup (i.e. mm, override creds, etc). The setup
shouldn't take long, so it's ok to arm it a bit later and get rid
of IO_WQ_WORK_CB.

Make io-wq call work->func() only once, callbacks will handle the rest.
i.e. the linked timeout handler will do the actual issue. And as a
bonus, it removes an extra indirect call.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-03 05:06:29 +0800
e85530ddd io-wq: remove unused IO_WQ_WORK_HAS_MM ... Browse Code »

IO_WQ_WORK_HAS_MM is set but never used, remove it.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-03 05:05:24 +0800
80ad89438 io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL ... Browse Code »

io_wq_flush() is buggy, during cancelation of a flush, the associated
work may be passed to the caller's (i.e. io_uring) @match callback. That
callback is expecting it to be embedded in struct io_kiocb. Cancelation
of internal work probably doesn't make a lot of sense to begin with.

As the flush helper is no longer used, just delete it and the associated
work flag.

Signed-off-by: Pavel Begunkov
Signed-off-by: Jens Axboe

Pavel Begunkov
2020-03-03 05:03:24 +0800