03 Nov, 2011

1 commit

  • In testing aio on a fast storage device, I found that the context lock
    takes up a fair amount of cpu time in the I/O submission path. The reason
    is that we take it for every I/O submitted (see __aio_get_req). Since we
    know how many I/Os are passed to io_submit, we can preallocate the kiocbs
    in batches, reducing the number of times we take and release the lock.

    In my testing, I was able to reduce the amount of time spent in
    _raw_spin_lock_irq by .56% (average of 3 runs). The command I used to
    test this was:

    aio-stress -O -o 2 -o 3 -r 8 -d 128 -b 32 -i 32 -s 16384

    I also tested the patch with various numbers of events passed to
    io_submit, and I ran the xfstests aio group of tests to ensure I didn't
    break anything.

    Signed-off-by: Jeff Moyer
    Cc: Daniel Ehrenberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

28 May, 2010

1 commit

  • The aio compat code was not converting the struct iovecs from 32bit to
    64bit pointers, causing either EINVAL to be returned from io_getevents, or
    EFAULT as the result of the I/O. This patch passes a compat flag to
    io_submit to signal that pointer conversion is necessary for a given iocb
    array.

    A variant of this was tested by Michael Tokarev. I have also updated the
    libaio test harness to exercise this code path with good success.
    Further, I grabbed a copy of ltp and ran the
    testcases/kernel/syscall/readv and writev tests there (compiled with -m32
    on my 64bit system). All seems happy, but extra eyes on this would be
    welcome.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
    Signed-off-by: Jeff Moyer
    Reported-by: Michael Tokarev
    Cc: Zach Brown
    Cc: [2.6.35.1]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

16 Dec, 2009

1 commit

  • Don't know the reason, but it appears ki_wait field of iocb never gets used.

    Signed-off-by: Shaohua Li
    Cc: Jeff Moyer
    Cc: Benjamin LaHaise
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shaohua Li
     

20 Sep, 2009

1 commit


01 Jul, 2009

1 commit

  • Change the eventfd interface to de-couple the eventfd memory context, from
    the file pointer instance.

    Without such change, there is no clean way to racely free handle the
    POLLHUP event sent when the last instance of the file* goes away. Also,
    now the internal eventfd APIs are using the eventfd context instead of the
    file*.

    This patch is required by KVM's IRQfd code, which is still under
    development.

    Signed-off-by: Davide Libenzi
    Cc: Gregory Haskins
    Cc: Rusty Russell
    Cc: Benjamin LaHaise
    Cc: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

29 Dec, 2008

1 commit

  • The mm->ioctx_list is currently protected by a reader-writer lock,
    so we always grab that lock on the read side for doing ioctx
    lookups. As the workload is extremely reader biased, turn this into
    an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
    the rwlock and use a spinlock for providing update side exclusion.

    There's usually only 1 entry on this list, so it doesn't make sense
    to look into fancier data structures.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     

17 Oct, 2008

1 commit

  • This patchs adds the CONFIG_AIO option which allows to remove support
    for asynchronous I/O operations, that are not necessarly used by
    applications, particularly on embedded devices. As this is a
    size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
    save ~7 kilobytes of kernel code/data:

    text data bss dec hex filename
    1115067 119180 217088 1451335 162547 vmlinux
    1108025 119048 217088 1444161 160941 vmlinux.new
    -7042 -132 0 -7174 -1C06 +/-

    This patch has been originally written by Matt Mackall
    , and is part of the Linux Tiny project.

    [randy.dunlap@oracle.com: build fix]
    Signed-off-by: Thomas Petazzoni
    Cc: Benjamin LaHaise
    Cc: Zach Brown
    Signed-off-by: Matt Mackall
    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Petazzoni
     

27 Jul, 2008

1 commit


29 Apr, 2008

1 commit

  • Make the following needlessly global functions static:

    - __put_ioctx()
    - lookup_ioctx()
    - io_submit_one()

    Signed-off-by: Adrian Bunk
    Cc: Zach Brown
    Cc: Benjamin LaHaise
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

19 Feb, 2008

1 commit

  • Commit b2e895dbd80c420bfc0937c3729b4afe073b3848 #if 0'ed this code stating:

    [PATCH] revert blockdev direct io back to 2.6.19 version

    Andrew Vasquez is reporting as-iosched oopses and a 65% throughput
    slowdown due to the recent special-casing of direct-io against
    blockdevs. We don't know why either of these things are occurring.

    The patch minimally reverts us back to the 2.6.19 code for a 2.6.20
    release.

    It has since been dead code, and unless someone wants to revive it now
    it's time to remove it.

    This patch also makes bio_release_pages() static again and removes the
    ki_bio_count member from struct kiocb, reverting changes that had been
    done for this dead code.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Jens Axboe

    Adrian Bunk
     

14 Feb, 2008

1 commit


19 Oct, 2007

1 commit

  • Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b
    during 2.6.9 development. Commit introduced io_wait field which remained
    write-only than and still remains write-only.

    Also garbage collect macros which "use" io_wait.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

20 Jul, 2007

1 commit

  • Fix type issue reported by latest 'sparse': kiocb.ki_flags should be
    "unsigned long" (not "long"), to match bitop type signature.

    Signed-off-by: David Brownell
    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     

11 May, 2007

1 commit

  • This is an example about how to add eventfd support to the current KAIO code,
    in order to enable KAIO to post readiness events to a pollable fd (hence
    compatible with POSIX select/poll). The KAIO code simply signals the eventfd
    fd when events are ready, and this triggers a POLLIN in the fd. This patch
    uses a reserved for future use member of the struct iocb to pass an eventfd
    file descriptor, that KAIO will use to post events every time a request
    completes. At that point, an aio_getevents() will return the completed result
    to a struct io_event. I made a quick test program to verify the patch, and it
    runs fine here:

    http://www.xmailserver.org/eventfd-aio-test.c

    The test program uses poll(2), but it'd, of course, work with select and epoll
    too.

    This can allow to schedule both block I/O and other poll-able devices
    requests, and wait for results using select/poll/epoll. In a typical
    scenario, an application would submit KAIO request using aio_submit(), and
    will also use epoll_ctl() on the whole other class of devices (that with the
    addition of signals, timers and user events, now it's pretty much complete),
    and then would:

    epoll_wait(...);
    for_each_event {
    if (curr_event_is_kaiofd) {
    aio_getevents();
    dispatch_aio_events();
    } else {
    dispatch_epoll_event();
    }
    }

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

10 May, 2007

1 commit

  • Stick an unlikely() around is_aio(): I assert that most IO is synchronous.

    Cc: Suparna Bhattacharya
    Cc: Ingo Molnar
    Cc: Benjamin LaHaise
    Cc: Zach Brown
    Cc: Ulrich Drepper
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

14 Dec, 2006

1 commit

  • Implement block device specific .direct_IO method instead of going through
    generic direct_io_worker for block device.

    direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
    file system, where it needs to perform block allocation, hole detection,
    extents file on write, and tons of other corner cases. The end result is
    that it takes tons of CPU time to submit an I/O.

    For block device, the block allocation is much simpler and a tight triple
    loop can be written to iterate each iovec and each page within the iovec in
    order to construct/prepare bio structure and then subsequently submit it to
    the block layer. This significantly speeds up O_D on block device.

    [akpm@osdl.org: small speedup]
    Signed-off-by: Ken Chen
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     

08 Dec, 2006

1 commit

  • Remove the ki_retried member from struct kiocb. I think the idea was
    bounced around a while back, but Arnaldo pointed out another reason that we
    should dig it up when he pointed out that the last cacheline of struct
    kiocb only contains 4 bytes. By removing the debugging member, we save
    more than the 8 byte on 64 bit machines.

    Signed-off-by: Benjamin LaHaise
    Acked-by: Ken Chen
    Acked-by: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin LaHaise
     

22 Nov, 2006

1 commit

  • Separate delayable work items from non-delayable work items be splitting them
    into a separate structure (delayed_work), which incorporates a work_struct and
    the timer_list removed from work_struct.

    The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
    architecture it's nearly 100 bytes in size. This reduces that by half for the
    non-delayable type of event.

    Signed-Off-By: David Howells

    David Howells
     

01 Oct, 2006

4 commits


09 Jan, 2006

1 commit

  • Reorder members of the kiocb structure to make sync kiocb setup faster. By
    setting the elements sequentially, the write combining buffers on the CPU
    are able to combine the writes into a single burst, which results in fewer
    cache cycles being consumed, freeing them up for other code. This results
    in a 10-20KB/s[*] increase on the bw_unix part of LMbench on my test
    system.

    * The improvement varies based on what other patches are in the system,
    as there are a number of bottlenecks, so this number is not absolutely
    accurate.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin LaHaise
     

14 Nov, 2005

2 commits

  • put_ioctx's refcount debugging was doing an atomic_read after dropping its
    reference when it wasn't the last ref, leaving a tiny race for another freeing
    thread to sneak into. This shifts the debugging before the ops, uses BUG_ON,
    and reformats the defines a little. Sadly, moving to inlines increased the
    code size but this change decreases the code size by a whole 9 bytes :)

    Signed-off-by: Zach Brown
    Cc: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if
    any, is done in the context of their owner who has allocated them on the
    stack.

    The sole user of a sync iocb's ctx reference was aio_complete() checking for
    an elevated iocb ref count that could never happen. No path which grabs an
    iocb ref has access to sync iocbs.

    If we were to implement sync iocb cancelation it would be done by the owner of
    the iocb using its on-stack reference.

    Removing this chunk from aio_complete allows us to remove the entire kioctx
    instance from mm_struct, reducing its size by a third. On a i386 testing box
    the slab size went from 768 to 504 bytes and from 5 to 8 per page.

    Signed-off-by: Zach Brown
    Acked-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

07 Nov, 2005

1 commit

  • AIO was adding a new context's max requests to the global total before
    testing if that resulting total was over the global limit. This let
    innocent tasks get their new limit tested along with a racing guilty task
    that was crossing the limit. This serializes the _nr accounting with a
    spinlock It also switches to using unsigned long for the global totals.
    Individual contexts are still limited to an unsigned int's worth of
    requests by the syscall interface.

    The problem and fix were verified with a simple program that spun creating
    and destroying a context while holding on to another long lived context.
    Before the patch a task creating a tiny context could get a spurious EAGAIN
    if it raced with a task creating a very large context that overran the
    limit.

    Signed-off-by: Zach Brown
    Cc: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

18 Oct, 2005

1 commit

  • lock_kiocb() was introduced to serialize retrying and cancellation. In the
    process of doing so it tried to sleep waiting for KIF_LOCKED while holding
    the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent
    retries won't be attempted for a given iocb. Cancel has other problems and
    has no significant in-tree users that have been complaining about it. So
    for the immediate future we'll revert sleeping with the lock held and will
    address proper cancellation and retry serialization in the future.

    Signed-off-by: Zach Brown
    Acked-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

01 Oct, 2005

1 commit

  • Only one of the run or kick path is supposed to put an iocb on the run
    list. If both of them do it than one of them can end up referencing a
    freed iocb. The kick path could delete the task_list item from the wait
    queue before getting the ctx_lock and putting the iocb on the run list.
    The run path was testing the task_list item outside the lock so that it
    could catch ki_retry methods that return -EIOCBRETRY *without* putting the
    iocb on a wait queue and promising to call kick_iocb. This unlocked check
    could then race with the kick path to cause both to try and put the iocb on
    the run list.

    The patch stops the run path from testing task_list by requring that any
    ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
    called in the future. aio_p{read,write}, the only in-tree -EIOCBRETRY
    users, are updated.

    Signed-off-by: Zach Brown
    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds