09 May, 2013

1 commit

  • Pull block core updates from Jens Axboe:

    - Major bit is Kents prep work for immutable bio vecs.

    - Stable candidate fix for a scheduling-while-atomic in the queue
    bypass operation.

    - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
    discard bios.

    - Tejuns changes to convert the writeback thread pool to the generic
    workqueue mechanism.

    - Runtime PM framework, SCSI patches exists on top of these in James'
    tree.

    - A few random fixes.

    * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
    relay: move remove_buf_file inside relay_close_buf
    partitions/efi.c: replace useless kzalloc's by kmalloc's
    fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
    block: fix max discard sectors limit
    blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
    Documentation: cfq-iosched: update documentation help for cfq tunables
    writeback: expose the bdi_wq workqueue
    writeback: replace custom worker pool implementation with unbound workqueue
    writeback: remove unused bdi_pending_list
    aoe: Fix unitialized var usage
    bio-integrity: Add explicit field for owner of bip_buf
    block: Add an explicit bio flag for bios that own their bvec
    block: Add bio_alloc_pages()
    block: Convert some code to bio_for_each_segment_all()
    block: Add bio_for_each_segment_all()
    bounce: Refactor __blk_queue_bounce to not use bi_io_vec
    raid1: use bio_copy_data()
    pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
    pktcdvd: use bio_copy_data()
    block: Add bio_copy_data()
    ...

    Linus Torvalds
     

08 May, 2013

39 commits

  • Merge more incoming from Andrew Morton:

    - Various fixes which were stalled or which I picked up recently

    - A large rotorooting of the AIO code. Allegedly to improve
    performance but I don't really have good performance numbers (I might
    have lost the email) and I can't raise Kent today. I held this out
    of 3.9 and we could give it another cycle if it's all too late/scary.

    I ended up taking only the first two thirds of the AIO rotorooting. I
    left the percpu parts and the batch completion for later. - Linus

    * emailed patches from Andrew Morton : (33 commits)
    aio: don't include aio.h in sched.h
    aio: kill ki_retry
    aio: kill ki_key
    aio: give shared kioctx fields their own cachelines
    aio: kill struct aio_ring_info
    aio: kill batch allocation
    aio: change reqs_active to include unreaped completions
    aio: use cancellation list lazily
    aio: use flush_dcache_page()
    aio: make aio_read_evt() more efficient, convert to hrtimers
    wait: add wait_event_hrtimeout()
    aio: refcounting cleanup
    aio: make aio_put_req() lockless
    aio: do fget() after aio_get_req()
    aio: dprintk() -> pr_debug()
    aio: move private stuff out of aio.h
    aio: add kiocb_cancel()
    aio: kill return value of aio_complete()
    char: add aio_{read,write} to /dev/{null,zero}
    aio: remove retry-based AIO
    ...

    Linus Torvalds
     
  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
    need this anymore - ki_retry was only called right after the kiocb was
    initialized.

    This also refactors and trims some duplicated code, as well as cleaning up
    the refcounting/error handling a bit.

    [akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
    [akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • ki_key wasn't actually used for anything previously - it was always 0.
    Drop it to trim struct kiocb a bit.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • [akpm@linux-foundation.org: make reqs_active __cacheline_aligned_in_smp]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • struct aio_ring_info was kind of odd, the only place it's used is where
    it's embedded in struct kioctx - there's no real need for it.

    The next patch rearranges struct kioctx and puts various things on their
    own cachelines - getting rid of struct aio_ring_info now makes that
    reordering a bit clearer.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Previously, allocating a kiocb required touching quite a few global
    (well, per kioctx) cachelines... so batching up allocation to amortize
    those was worthwhile. But we've gotten rid of some of those, and in
    another couple of patches kiocb allocation won't require writing to any
    shared cachelines, so that means we can just rip this code out.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • The aio code tries really hard to avoid having to deal with the
    completion ringbuffer overflowing. To do that, it has to keep track of
    the number of outstanding kiocbs, and the number of completions
    currently in the ringbuffer - and it's got to check that every time we
    allocate a kiocb. Ouch.

    But - we can improve this quite a bit if we just change reqs_active to
    mean "number of outstanding requests and unreaped completions" - that
    means kiocb allocation doesn't have to look at the ringbuffer, which is
    a fairly significant win.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Cancelling kiocbs requires adding them to a per kioctx linked list,
    which is one of the few things we need to take the kioctx lock for in
    the fast path. But most kiocbs can't be cancelled - so if we just do
    this lazily, we can avoid quite a bit of locking overhead.

    While we're at it, instead of using a flag bit switch to using ki_cancel
    itself to indicate that a kiocb has been cancelled/completed. This lets
    us get rid of ki_flags entirely.

    [akpm@linux-foundation.org: remove buggy BUG()]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • This wasn't causing problems before because it's not needed on x86, but
    it is needed on other architectures.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Previously, aio_read_event() pulled a single completion off the
    ringbuffer at a time, locking and unlocking each time. Change it to
    pull off as many events as it can at a time, and copy them directly to
    userspace.

    This also fixes a bug where if copying the event to userspace failed,
    we'd lose the event.

    Also convert it to wait_event_interruptible_hrtimeout(), which
    simplifies it quite a bit.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Analagous to wait_event_timeout() and friends, this adds
    wait_event_hrtimeout() and wait_event_interruptible_hrtimeout().

    Note that unlike the versions that use regular timers, these don't
    return the amount of time remaining when they return - instead, they
    return 0 or -ETIME if they timed out. because I was uncomfortable with
    the semantics of doing it the other way (that I could get it right,
    anyways).

    If the timer expires, there's no real guarantee that expire_time -
    current_time would be
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • The usage of ctx->dead was fubar - it makes no sense to explicitly check
    it all over the place, especially when we're already using RCU.

    Now, ctx->dead only indicates whether we've dropped the initial
    refcount. The new teardown sequence is:

    set ctx->dead
    hlist_del_rcu();
    synchronize_rcu();

    Now we know no system calls can take a new ref, and it's safe to drop
    the initial ref:

    put_ioctx();

    We also need to ensure there are no more outstanding kiocbs. This was
    done incorrectly - it was being done in kill_ctx(), and before dropping
    the initial refcount. At this point, other syscalls may still be
    submitting kiocbs!

    Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
    kioctx->users has dropped to 0 and we know no more iocbs could be
    submitted.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Freeing a kiocb needed to touch the kioctx for three things:

    * Pull it off the reqs_active list
    * Decrementing reqs_active
    * Issuing a wakeup, if the kioctx was in the process of being freed.

    This patch moves these to aio_complete(), for a couple reasons:

    * aio_complete() already has to issue the wakeup, so if we drop the
    kioctx refcount before aio_complete does its wakeup we don't have to
    do it twice.
    * aio_complete currently has to take the kioctx lock, so it makes sense
    for it to pull the kiocb off the reqs_active list too.
    * A later patch is going to change reqs_active to include unreaped
    completions - this will mean allocating a kiocb doesn't have to look
    at the ringbuffer. So taking the decrement of reqs_active out of
    kiocb_free() is useful prep work for that patch.

    This doesn't really affect cancellation, since existing (usb) code that
    implements a cancel function still calls aio_complete() - we just have
    to make sure that aio_complete does the necessary teardown for cancelled
    kiocbs.

    It does affect code paths where we free kiocbs that were never
    submitted; they need to decrement reqs_active and pull the kiocb off the
    reqs_active list. This occurs in two places: kiocb_batch_free(), which
    is going away in a later patch, and the error path in io_submit_one.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • aio_get_req() will fail if we have the maximum number of requests
    outstanding, which depending on the application may not be uncommon. So
    avoid doing an unnecessary fget().

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Minor refactoring, to get rid of some duplicated code

    [akpm@linux-foundation.org: fix warning]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Nothing used the return value, and it probably wasn't possible to use it
    safely for the locked versions (aio_complete(), aio_put_req()). Just
    kill it.

    Signed-off-by: Kent Overstreet
    Acked-by: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • These are handy for measuring the cost of the aio infrastructure with
    operations that do very little and complete immediately.

    Signed-off-by: Zach Brown
    Signed-off-by: Kent Overstreet
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • This removes the retry-based AIO infrastructure now that nothing in tree
    is using it.

    We want to remove retry-based AIO because it is fundemantally unsafe.
    It retries IO submission from a kernel thread that has only assumed the
    mm of the submitting task. All other task_struct references in the IO
    submission path will see the kernel thread, not the submitting task.
    This design flaw means that nothing of any meaningful complexity can use
    retry-based AIO.

    This removes all the code and data associated with the retry machinery.
    The most significant benefit of this is the removal of the locking
    around the unused run list in the submission path.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Kent Overstreet
    Signed-off-by: Zach Brown
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • This removes the only in-tree user of aio retry. This will let us
    remove the retry code from the aio core.

    Removing retry is relatively easy as the USB gadget wasn't using it to
    retry IOs at all. It always fully submitted the IO in the context of
    the initial io_submit() call. It only used the AIO retry facility to
    get the submitter's mm context for copying the result of a read back to
    user space. This is easy to implement with use_mm() and a work struct,
    much like kvm does with async_pf_execute() for get_user_pages().

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Zach Brown
    Signed-off-by: Kent Overstreet
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Signed-off-by: Zach Brown
    Signed-off-by: Kent Overstreet
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Bunch of performance improvements and cleanups Zach Brown and I have
    been working on. The code should be pretty solid at this point, though
    it could of course use more review and testing.

    The results in my testing are pretty impressive, particularly when an
    ioctx is being shared between multiple threads. In my crappy synthetic
    benchmark, with 4 threads submitting and one thread reaping completions,
    I saw overhead in the aio code go from ~50% (mostly ioctx lock
    contention) to low single digits. Performance with ioctx per thread
    improved too, but I'd have to rerun those benchmarks.

    The reason I've been focused on performance when the ioctx is shared is
    that for a fair number of real world completions, userspace needs the
    completions aggregated somehow - in practice people just end up
    implementing this aggregation in userspace today, but if it's done right
    we can do it much more efficiently in the kernel.

    Performance wise, the end result of this patch series is that submitting
    a kiocb writes to _no_ shared cachelines - the penalty for sharing an
    ioctx is gone there. There's still going to be some cacheline
    contention when we deliver the completions to the aio ringbuffer (at
    least if you have interrupts being delivered on multiple cores, which
    for high end stuff you do) but I have a couple more patches not in this
    series that implement coalescing for that (by taking advantage of
    interrupt coalescing). With that, there's basically no bottlenecks or
    performance issues to speak of in the aio code.

    This patch:

    use_mm() is used in more places than just aio. There's no need to mention
    callers when describing the function.

    Signed-off-by: Zach Brown
    Signed-off-by: Kent Overstreet
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Acked-by: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • After finishing a naming transition, remove unused backward
    compatibility wrapper macros

    Signed-off-by: Akinobu Mita
    Cc: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Use preferable function name which implies using a pseudo-random number
    generator.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Use preferable function name which implies using a pseudo-random number
    generator.

    [akpm@linux-foundation.org: convert team_mode_random.c]
    Signed-off-by: Akinobu Mita
    Acked-by: Thomas Sailer
    Acked-by: Bing Zhao [mwifiex]
    Cc: "David S. Miller"
    Cc: Michael Chan
    Cc: Thomas Sailer
    Cc: Jean-Paul Roubelat
    Cc: Bing Zhao
    Cc: Brett Rudley
    Cc: Arend van Spriel
    Cc: "Franky (Zhenhui) Lin"
    Cc: Hante Meuleman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The current kernel returns -EINVAL unless a given mmap length is
    "almost" hugepage aligned. This is because in sys_mmap_pgoff() the
    given length is passed to vm_mmap_pgoff() as it is without being aligned
    with hugepage boundary.

    This is a regression introduced in commit 40716e29243d ("hugetlbfs: fix
    alignment of huge page requests"), where alignment code is pushed into
    hugetlb_file_setup() and the variable len in caller side is not changed.

    To fix this, this patch partially reverts that commit, and adds
    alignment code in caller side. And it also introduces hstate_sizelog()
    in order to get proper hstate to specified hugepage size.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881

    [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Johannes Weiner
    Reported-by:
    Cc: Steven Truelove
    Cc: Jianguo Wu
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • kmap_atomic() requires only one argument now.

    Signed-off-by: Zhao Hongjiang
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Rolf Eike Beer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhao Hongjiang
     
  • Register layout is the same, so just add the variant to the appropriate
    places.

    Signed-off-by: Lucas Stach
    Signed-off-by: Jan Luebbe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas Stach
     
  • That nameless-function-arguments thing drives me batty. Fix.

    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This exports the amount of anonymous transparent hugepages for each
    memcg via the new "rss_huge" stat in memory.stat. The units are in
    bytes.

    This is helpful to determine the hugepage utilization for individual
    jobs on the system in comparison to rss and opportunities where
    MADV_HUGEPAGE may be helpful.

    The amount of anonymous transparent hugepages is also included in "rss"
    for backwards compatibility.

    Signed-off-by: David Rientjes
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Use common help functions to free reserved pages.

    Signed-off-by: Jiang Liu
    Acked-by: David S. Miller
    Acked-by: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • I badly screwed up the merge in commit 6fa52ed33bea ("Merge tag
    'drivers-for-linus' of git://git.kernel.org/pub/.../arm-soc") by
    incorrectly taking the arch/arm/mach-omap2/* data fully from the merge
    target because the 'drivers-for-linus' branch seemed to be a proper
    superset of the duplicate ARM commits.

    That was bogus: commit ff931c821bab ("ARM: OMAP: clocks: Delay clk inits
    atleast until slab is initialized") only existed in head, and the
    changes to arch/arm/mach-omap2/timer.c from that commit got list.

    Re-doing the merge more carefully, I do think this part was the only
    thing I screwed up. Knock wood.

    Reported-by: Tony Lindgren
    Cc: Arnd Bergmann
    Cc: Olof Johansson
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This patch tries to reduce the amount of cmpxchg calls in the writer
    failed path by checking the counter value first before issuing the
    instruction. If ->count is not set to RWSEM_WAITING_BIAS then there is
    no point wasting a cmpxchg call.

    Furthermore, Michel states "I suppose it helps due to the case where
    someone else steals the lock while we're trying to acquire
    sem->wait_lock."

    Two very different workloads and machines were used to see how this
    patch improves throughput: pgbench on a quad-core laptop and aim7 on a
    large 8 socket box with 80 cores.

    Some results comparing Michel's fast-path write lock stealing
    (tps-rwsem) on a quad-core laptop running pgbench:

    | db_size | clients | tps-rwsem | tps-patch |
    +---------+----------+----------------+--------------+
    | 160 MB | 1 | 6906 | 9153 | + 32.5
    | 160 MB | 2 | 15931 | 22487 | + 41.1%
    | 160 MB | 4 | 33021 | 32503 |
    | 160 MB | 8 | 34626 | 34695 |
    | 160 MB | 16 | 33098 | 34003 |
    | 160 MB | 20 | 31343 | 31440 |
    | 160 MB | 30 | 28961 | 28987 |
    | 160 MB | 40 | 26902 | 26970 |
    | 160 MB | 50 | 25760 | 25810 |
    ------------------------------------------------------
    | 1.6 GB | 1 | 7729 | 7537 |
    | 1.6 GB | 2 | 19009 | 23508 | + 23.7%
    | 1.6 GB | 4 | 33185 | 32666 |
    | 1.6 GB | 8 | 34550 | 34318 |
    | 1.6 GB | 16 | 33079 | 32689 |
    | 1.6 GB | 20 | 31494 | 31702 |
    | 1.6 GB | 30 | 28535 | 28755 |
    | 1.6 GB | 40 | 27054 | 27017 |
    | 1.6 GB | 50 | 25591 | 25560 |
    ------------------------------------------------------
    | 7.6 GB | 1 | 6224 | 7469 | + 20.0%
    | 7.6 GB | 2 | 13611 | 12778 |
    | 7.6 GB | 4 | 33108 | 32927 |
    | 7.6 GB | 8 | 34712 | 34878 |
    | 7.6 GB | 16 | 32895 | 33003 |
    | 7.6 GB | 20 | 31689 | 31974 |
    | 7.6 GB | 30 | 29003 | 28806 |
    | 7.6 GB | 40 | 26683 | 26976 |
    | 7.6 GB | 50 | 25925 | 25652 |
    ------------------------------------------------------

    For the aim7 worloads, they overall improved on top of Michel's
    patchset. For full graphs on how the rwsem series plus this patch
    behaves on a large 8 socket machine against a vanilla kernel:

    http://stgolabs.net/rwsem-aim7-results.tar.gz

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • - make warning smp-safe
    - result of atomic _unless_zero functions should be checked by caller
    to avoid use-after-free error
    - trivial whitespace fix.

    Link: https://lkml.org/lkml/2013/4/12/391

    Tested: compile x86, boot machine and run xfstests
    Signed-off-by: Anatol Pomozov
    [ Removed line-break, changed to use WARN_ON_ONCE() - Linus ]
    Signed-off-by: Linus Torvalds

    Anatol Pomozov
     
  • Pull more vfs updates from Al Viro:
    "A couple of fixes + getting rid of __blkdev_put() return value"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    proc: Use PDE attribute setting accessor functions
    make blkdev_put() return void
    block_device_operations->release() should return void
    mtd_blktrans_ops->release() should return void
    hfs: SMP race on directory close()

    Linus Torvalds
     
  • Pull parisc updates from Helge Deller:
    "Main fixes and updates in this patch series are:
    - we faced kernel stack corruptions because of multiple delivery of
    interrupts
    - added kernel stack overflow checks
    - added possibility to use dedicated stacks for irq processing
    - initial support for page sizes > 4k
    - more information in /proc/interrupts (e.g. TLB flushes and number
    of IPI calls)
    - documented how the parisc gateway page works
    - and of course quite some other smaller cleanups and fixes."

    * 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: tlb flush counting fix for SMP and UP
    parisc: more irq statistics in /proc/interrupts
    parisc: implement irq stacks
    parisc: add kernel stack overflow check
    parisc: only re-enable interrupts if we need to schedule or deliver signals when returning to userspace
    parisc: implement atomic64_dec_if_positive()
    parisc: use long branch in fork_like macro
    parisc: fix NATIVE set up in build
    parisc: document the parisc gateway page
    parisc: fix partly 16/64k PAGE_SIZE boot
    parisc: Provide default implementation for dma_{alloc, free}_attrs
    parisc: fix whitespace errors in arch/parisc/kernel/traps.c
    parisc: remove the second argument of kmap_atomic

    Linus Torvalds