04 Jul, 2006

1 commit

  • Teach special (rwsem-in-irq) locking code to the lock validator. Has no
    effect on non-lockdep kernels.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

23 Jun, 2006

1 commit

  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Apr, 2006

1 commit


29 Mar, 2006

1 commit


27 Mar, 2006

1 commit

  • Now that get_block() can handle mapping multiple disk blocks, no need to have
    ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
    and makes it users use get_block() instead.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     

26 Mar, 2006

1 commit

  • There is a bug in direct-io on propagating write error up to the higher I/O
    layer. When performing an async ODIRECT write to a block device, if a
    device error occurred (like media error or disk is pulled), the error code
    is only propagated from device driver to the DIO layer. The error code
    stops at finished_one_bio(). The aysnc write, however, is supposedly have
    a corresponding AIO event with appropriate return code (in this case -EIO).
    Application which waits on the async write event, will hang forever since
    such AIO event is lost forever (if such app did not use the timeout option
    in io_getevents call. Regardless, an AIO event is lost).

    The discovery of above bug leads to another discovery of potential race
    window with dio->result. The fundamental problem is that dio->result is
    overloaded with dual use: an indicator of fall back path for partial dio
    write, and an error indicator used in the I/O completion path. In the
    event of device error, the setting of -EIO to dio->result clashes with
    value used to track partial write that activates the fall back path.

    It was also pointed out that it is impossible to use dio->result to track
    partial write and at the same time to track error returned from device
    driver. Because direct_io_work can only determines whether it is a partial
    write at the end of io submission and in mid stream of those io submission,
    a return code could be coming back from the driver. Thus messing up all
    the subsequent logic.

    Proposed fix is to separating out error code returned by the IO completion
    path from partial IO submit tracking. A new variable is added to dio
    structure specifically to track io error returned in the completion path.

    Signed-off-by: Ken Chen
    Acked-by: Zach Brown
    Acked-by: Suparna Bhattacharya
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     

15 Mar, 2006

1 commit

  • Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
    not possible to get i_mutex locking correct when using DIO_OWN
    direct I/O locking in a filesystem due to indeterminism in the
    possible return code/lock/unlock combinations. This can cause
    a direct read to attempt a double i_mutex unlock inside XFS.

    We're now ensuring __blockdev_direct_IO always exits with the
    inode i_mutex (still) held for a direct reader.

    Tested with the three different locking modes (via direct block
    device access, ext3 and XFS) - both reading and writing; cannot
    find any regressions resulting from this change, and it clearly
    fixes the mutex_unlock warning originally reported here:
    http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2

    Signed-off-by: Nathan Scott
    Acked-by: Christoph Hellwig

    Nathan Scott
     

04 Feb, 2006

1 commit

  • Currently, if you open a file O_DIRECT, truncate it to a size that is not a
    multiple of the disk block size, and then try to read the last block in the
    file, the read will return 0. The problem is in do_direct_IO, here:

    /* Handle holes */
    if (!buffer_mapped(map_bh)) {
    char *kaddr;

    ...

    if (dio->block_in_file >=
    i_size_read(dio->inode)>>blkbits) {
    /* We hit eof */
    page_cache_release(page);
    goto out;
    }

    We shift off any remaining bytes in the final block of the I/O, resulting
    in a 0-sized read. I've attached a patch that fixes this. I'm not happy
    about how ugly the math is getting, so suggestions are more than welcome.

    I've tested this with a simple program that performs the steps outlined for
    reproducing the problem above. Without the patch, we get a 0-sized result
    from read. With the patch, we get the correct return value from the short
    read.

    Signed-off-by: Jeff Moyer
    Cc: Badari Pulavarty
    Cc: Suparna Bhattacharya
    Cc: Mingming Cao
    Cc: Joel Becker
    Cc: "Chen, Kenneth W"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

10 Jan, 2006

1 commit


30 Oct, 2005

1 commit

  • Remove PageReserved() calls from core code by tightening VM_RESERVED
    handling in mm/ to cover PageReserved functionality.

    PageReserved special casing is removed from get_page and put_page.

    All setting and clearing of PageReserved is retained, and it is now flagged
    in the page_alloc checks to help ensure we don't introduce any refcount
    based freeing of Reserved pages.

    MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
    deprecated. We never completely handled it correctly anyway, and is be
    reintroduced in future if required (Hugh has a proof of concept).

    Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
    arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
    be trivially removed.

    Last real user of PageReserved is swsusp, which uses PageReserved to
    determine whether a struct page points to valid memory or not. This still
    needs to be addressed (a generic page_is_ram() should work).

    A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
    thus mapcounted and count towards shared rss). These writes to the struct
    page could cause excessive cacheline bouncing on big systems. There are a
    number of ways this could be addressed if it is an issue.

    Signed-off-by: Nick Piggin

    Refcount bug fix for filemap_xip.c

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

24 Jun, 2005

1 commit


17 Apr, 2005

2 commits

  • The direct I/O code is mapping the read request to the file system block. If
    the file size was not on a block boundary, the result would show the the read
    reading past EOF. This was only happening for the AIO case. The non-AIO case
    truncates the result to match file size (in direct_io_worker). This patch
    does the same thing for the AIO case, it truncates the result to match the
    file size if the read reads past EOF.

    When I/O completes the result can be truncated to match the file size
    without using i_size_read(), thus the aio result now matches the number of
    bytes read to the end of file.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel McNeil
     
  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds