11 May, 2007

1 commit

  • This is an example about how to add eventfd support to the current KAIO code,
    in order to enable KAIO to post readiness events to a pollable fd (hence
    compatible with POSIX select/poll). The KAIO code simply signals the eventfd
    fd when events are ready, and this triggers a POLLIN in the fd. This patch
    uses a reserved for future use member of the struct iocb to pass an eventfd
    file descriptor, that KAIO will use to post events every time a request
    completes. At that point, an aio_getevents() will return the completed result
    to a struct io_event. I made a quick test program to verify the patch, and it
    runs fine here:

    http://www.xmailserver.org/eventfd-aio-test.c

    The test program uses poll(2), but it'd, of course, work with select and epoll
    too.

    This can allow to schedule both block I/O and other poll-able devices
    requests, and wait for results using select/poll/epoll. In a typical
    scenario, an application would submit KAIO request using aio_submit(), and
    will also use epoll_ctl() on the whole other class of devices (that with the
    addition of signals, timers and user events, now it's pretty much complete),
    and then would:

    epoll_wait(...);
    for_each_event {
    if (curr_event_is_kaiofd) {
    aio_getevents();
    dispatch_aio_events();
    } else {
    dispatch_epoll_event();
    }
    }

    Signed-off-by: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davide Libenzi
     

10 May, 2007

1 commit

  • Stick an unlikely() around is_aio(): I assert that most IO is synchronous.

    Cc: Suparna Bhattacharya
    Cc: Ingo Molnar
    Cc: Benjamin LaHaise
    Cc: Zach Brown
    Cc: Ulrich Drepper
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

14 Dec, 2006

1 commit

  • Implement block device specific .direct_IO method instead of going through
    generic direct_io_worker for block device.

    direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
    file system, where it needs to perform block allocation, hole detection,
    extents file on write, and tons of other corner cases. The end result is
    that it takes tons of CPU time to submit an I/O.

    For block device, the block allocation is much simpler and a tight triple
    loop can be written to iterate each iovec and each page within the iovec in
    order to construct/prepare bio structure and then subsequently submit it to
    the block layer. This significantly speeds up O_D on block device.

    [akpm@osdl.org: small speedup]
    Signed-off-by: Ken Chen
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     

08 Dec, 2006

1 commit

  • Remove the ki_retried member from struct kiocb. I think the idea was
    bounced around a while back, but Arnaldo pointed out another reason that we
    should dig it up when he pointed out that the last cacheline of struct
    kiocb only contains 4 bytes. By removing the debugging member, we save
    more than the 8 byte on 64 bit machines.

    Signed-off-by: Benjamin LaHaise
    Acked-by: Ken Chen
    Acked-by: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin LaHaise
     

22 Nov, 2006

1 commit

  • Separate delayable work items from non-delayable work items be splitting them
    into a separate structure (delayed_work), which incorporates a work_struct and
    the timer_list removed from work_struct.

    The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
    architecture it's nearly 100 bytes in size. This reduces that by half for the
    non-delayable type of event.

    Signed-Off-By: David Howells

    David Howells
     

01 Oct, 2006

4 commits


09 Jan, 2006

1 commit

  • Reorder members of the kiocb structure to make sync kiocb setup faster. By
    setting the elements sequentially, the write combining buffers on the CPU
    are able to combine the writes into a single burst, which results in fewer
    cache cycles being consumed, freeing them up for other code. This results
    in a 10-20KB/s[*] increase on the bw_unix part of LMbench on my test
    system.

    * The improvement varies based on what other patches are in the system,
    as there are a number of bottlenecks, so this number is not absolutely
    accurate.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Benjamin LaHaise
     

14 Nov, 2005

2 commits

  • put_ioctx's refcount debugging was doing an atomic_read after dropping its
    reference when it wasn't the last ref, leaving a tiny race for another freeing
    thread to sneak into. This shifts the debugging before the ops, uses BUG_ON,
    and reformats the defines a little. Sadly, moving to inlines increased the
    code size but this change decreases the code size by a whole 9 bytes :)

    Signed-off-by: Zach Brown
    Cc: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     
  • Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if
    any, is done in the context of their owner who has allocated them on the
    stack.

    The sole user of a sync iocb's ctx reference was aio_complete() checking for
    an elevated iocb ref count that could never happen. No path which grabs an
    iocb ref has access to sync iocbs.

    If we were to implement sync iocb cancelation it would be done by the owner of
    the iocb using its on-stack reference.

    Removing this chunk from aio_complete allows us to remove the entire kioctx
    instance from mm_struct, reducing its size by a third. On a i386 testing box
    the slab size went from 768 to 504 bytes and from 5 to 8 per page.

    Signed-off-by: Zach Brown
    Acked-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

07 Nov, 2005

1 commit

  • AIO was adding a new context's max requests to the global total before
    testing if that resulting total was over the global limit. This let
    innocent tasks get their new limit tested along with a racing guilty task
    that was crossing the limit. This serializes the _nr accounting with a
    spinlock It also switches to using unsigned long for the global totals.
    Individual contexts are still limited to an unsigned int's worth of
    requests by the syscall interface.

    The problem and fix were verified with a simple program that spun creating
    and destroying a context while holding on to another long lived context.
    Before the patch a task creating a tiny context could get a spurious EAGAIN
    if it raced with a task creating a very large context that overran the
    limit.

    Signed-off-by: Zach Brown
    Cc: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

18 Oct, 2005

1 commit

  • lock_kiocb() was introduced to serialize retrying and cancellation. In the
    process of doing so it tried to sleep waiting for KIF_LOCKED while holding
    the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent
    retries won't be attempted for a given iocb. Cancel has other problems and
    has no significant in-tree users that have been complaining about it. So
    for the immediate future we'll revert sleeping with the lock held and will
    address proper cancellation and retry serialization in the future.

    Signed-off-by: Zach Brown
    Acked-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

01 Oct, 2005

1 commit

  • Only one of the run or kick path is supposed to put an iocb on the run
    list. If both of them do it than one of them can end up referencing a
    freed iocb. The kick path could delete the task_list item from the wait
    queue before getting the ctx_lock and putting the iocb on the run list.
    The run path was testing the task_list item outside the lock so that it
    could catch ki_retry methods that return -EIOCBRETRY *without* putting the
    iocb on a wait queue and promising to call kick_iocb. This unlocked check
    could then race with the kick path to cause both to try and put the iocb on
    the run list.

    The patch stops the run path from testing task_list by requring that any
    ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
    called in the future. aio_p{read,write}, the only in-tree -EIOCBRETRY
    users, are updated.

    Signed-off-by: Zach Brown
    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zach Brown
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds