Commit 4fcc712f5c48b1e32cdbf9b9cfba42a27b2e3160

Authored by Kent Overstreet
Committed by Linus Torvalds
1 parent bba00e5910

aio: fix io_destroy() regression by using call_rcu()

There was a regression introduced by 36f5588905c1 ("aio: refcounting
cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
using RCU in the shutdown path, but the synchronize_rcu() was done in
the context of the io_destroy() syscall greatly increasing the time it
could block.

This patch switches it to call_rcu() and makes shutdown asynchronous
(more asynchronous than it was originally; before the refcount changes
io_destroy() would still wait on pending kiocbs).

Note that there's a global quota on the max outstanding kiocbs, and that
quota must be manipulated synchronously; otherwise io_setup() could
return -EAGAIN when there isn't quota available, and userspace won't
have any way of waiting until shutdown of the old kioctxs has finished
(besides busy looping).

So we release our quota before kioctx shutdown has finished, which
should be fine since the quota never corresponded to anything real
anyways.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Showing 1 changed file with 16 additions and 20 deletions Side-by-side Diff

... ... @@ -141,9 +141,6 @@
141 141 for (i = 0; i < ctx->nr_pages; i++)
142 142 put_page(ctx->ring_pages[i]);
143 143  
144   - if (ctx->mmap_size)
145   - vm_munmap(ctx->mmap_base, ctx->mmap_size);
146   -
147 144 if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages)
148 145 kfree(ctx->ring_pages);
149 146 }
... ... @@ -322,11 +319,6 @@
322 319  
323 320 aio_free_ring(ctx);
324 321  
325   - spin_lock(&aio_nr_lock);
326   - BUG_ON(aio_nr - ctx->max_reqs > aio_nr);
327   - aio_nr -= ctx->max_reqs;
328   - spin_unlock(&aio_nr_lock);
329   -
330 322 pr_debug("freeing %p\n", ctx);
331 323  
332 324 /*
333 325  
334 326  
... ... @@ -435,17 +427,24 @@
435 427 {
436 428 if (!atomic_xchg(&ctx->dead, 1)) {
437 429 hlist_del_rcu(&ctx->list);
438   - /* Between hlist_del_rcu() and dropping the initial ref */
439   - synchronize_rcu();
440 430  
441 431 /*
442   - * We can't punt to workqueue here because put_ioctx() ->
443   - * free_ioctx() will unmap the ringbuffer, and that has to be
444   - * done in the original process's context. kill_ioctx_rcu/work()
445   - * exist for exit_aio(), as in that path free_ioctx() won't do
446   - * the unmap.
  432 + * It'd be more correct to do this in free_ioctx(), after all
  433 + * the outstanding kiocbs have finished - but by then io_destroy
  434 + * has already returned, so io_setup() could potentially return
  435 + * -EAGAIN with no ioctxs actually in use (as far as userspace
  436 + * could tell).
447 437 */
448   - kill_ioctx_work(&ctx->rcu_work);
  438 + spin_lock(&aio_nr_lock);
  439 + BUG_ON(aio_nr - ctx->max_reqs > aio_nr);
  440 + aio_nr -= ctx->max_reqs;
  441 + spin_unlock(&aio_nr_lock);
  442 +
  443 + if (ctx->mmap_size)
  444 + vm_munmap(ctx->mmap_base, ctx->mmap_size);
  445 +
  446 + /* Between hlist_del_rcu() and dropping the initial ref */
  447 + call_rcu(&ctx->rcu_head, kill_ioctx_rcu);
449 448 }
450 449 }
451 450  
... ... @@ -495,10 +494,7 @@
495 494 */
496 495 ctx->mmap_size = 0;
497 496  
498   - if (!atomic_xchg(&ctx->dead, 1)) {
499   - hlist_del_rcu(&ctx->list);
500   - call_rcu(&ctx->rcu_head, kill_ioctx_rcu);
501   - }
  497 + kill_ioctx(ctx);
502 498 }
503 499 }
504 500