13 Jan, 2012

2 commits

  • Some investigation of a transaction processing workload showed that a
    major consumer of cycles in __blockdev_direct_IO is the cache miss while
    accessing the block size. This is because it has to walk the chain from
    block_dev to gendisk to queue.

    The block size is needed early on to check alignment and sizes. It's only
    done if the check for the inode block size fails. But the costly block
    device state is unconditionally fetched.

    - Reorganize the code to only fetch block dev state when actually
    needed.

    Then do a prefetch on the block dev early on in the direct IO path. This
    is worth it, because there is substantial code run before we actually
    touch the block dev now.

    - I also added some unlikelies to make it clear the compiler that block
    device fetch code is not normally executed.

    This gave a small, but measurable improvement on a large database
    benchmark (about 0.3%)

    [akpm@linux-foundation.org: coding-style fixes]
    [sfr@canb.auug.org.au: using prefetch requires including prefetch.h]
    Signed-off-by: Andi Kleen
    Cc: Jeff Moyer
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • In get_more_blocks(), we use dio_count to calcuate fs_count and do some
    tricky things to increase fs_count if dio_count isn't aligned. But
    actually it still has some corner cases that can't be coverd. See the
    following example:

    dio_write foo -s 1024 -w 4096

    (direct write 4096 bytes at offset 1024). The same goes if the offset
    isn't aligned to fs_blocksize.

    In this case, the old calculation counts fs_count to be 1, but actually we
    will write into 2 different blocks (if fs_blocksize=4096). The old code
    just works, since it will call get_block twice (and may have to allocate
    and create extents twice for filesystems like ext4). So we'd better call
    get_block just once with the proper fs_count.

    Signed-off-by: Tao Ma
    Cc: "Theodore Ts'o"
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tao Ma
     

28 Oct, 2011

7 commits

  • This doesn't change anything for the compiler, but hch thought it would
    make the code clearer.

    I moved the reference counting into its own little inline.

    Signed-off-by: Andi Kleen
    Acked-by: Jeff Moyer
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • Add inlines to all the submission path functions. While this increases
    code size it also gives gcc a lot of optimization opportunities
    in this critical hotpath.

    In particular -- together with some other changes -- this
    allows gcc to get rid of the unnecessary clearing of
    sdio at the beginning and optimize the messy parameter passing.
    Any non inlining of a function which takes a sdio parameter
    would break this optimization because they cannot be done if the
    address of a structure is taken.

    Note that benefits are only seen with CONFIG_OPTIMIZE_INLINING
    and CONFIG_CC_OPTIMIZE_FOR_SIZE both set to off.

    This gives about 2.2% improvement on a large database benchmark
    with a high IOPS rate.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • Only a single b_private field in the map_bh buffer head is needed after
    the submission path. Move map_bh separately to avoid storing
    this information in the long term slab.

    This avoids the weird 104 byte hole in struct dio_submit which also needed
    to be memseted early.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • A direct slab call is slightly faster than kmalloc and can be better cached
    per CPU. It also avoids rounding to the next kmalloc slab.

    In addition this enforces cache line alignment for struct dio to avoid
    any false sharing.

    Signed-off-by: Andi Kleen
    Acked-by: Jeff Moyer
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • Fix most problems reported by pahole.

    There is still a weird 104 byte hole after map_bh. I'm not sure what
    causes this.

    Signed-off-by: Andi Kleen
    Acked-by: Jeff Moyer
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • There's nothing on the stack, even before my changes.

    Signed-off-by: Andi Kleen
    Acked-by: Jeff Moyer
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • This large, but largely mechanic, patch moves all fields in struct dio
    that are only used in the submission path into a separate on stack
    data structure. This has the advantage that the memory is very likely
    cache hot, which is not guaranteed for memory fresh out of kmalloc.

    This also gives gcc more optimization potential because it can easier
    determine that there are no external aliases for these variables.

    The sdio initialization is a initialization now instead of memset.
    This allows gcc to break sdio into individual fields and optimize
    away unnecessary zeroing (after all the functions are inlined)

    Signed-off-by: Andi Kleen
    Acked-by: Jeff Moyer
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

21 Jul, 2011

4 commits

  • For filesystems that delay their end_io processing we should keep our
    i_dio_count until the the processing is done. Enable this by moving
    the inode_dio_done call to the end_io handler if one exist. Note that
    the actual move to the workqueue for ext4 and XFS is not done in
    this patch yet, but left to the filesystem maintainers. At least
    for XFS it's not needed yet either as XFS has an internal equivalent
    to i_dio_count.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING.
    This these filesystems to also protect truncate against direct I/O requests
    by using common code. Right now the only non-DIO_LOCKING filesystem that
    appears to do so is XFS, which uses an opencoded variant of the i_dio_count
    scheme.

    Behaviour doesn't change for filesystems never calling inode_dio_wait.
    For ext4 behaviour changes when using the dioread_nonlock option, which
    previously was missing any protection between truncate and direct I/O reads.
    For ocfs2 that handcrafted i_dio_count manipulations are replaced with
    the common code now enable.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • i_alloc_sem is a rather special rw_semaphore. It's the last one that may
    be released by a non-owner, and it's write side is always mirrored by
    real exclusion. It's intended use it to wait for all pending direct I/O
    requests to finish before starting a truncate.

    Replace it with a hand-grown construct:

    - exclusion for truncates is already guaranteed by i_mutex, so it can
    simply fall way
    - the reader side is replaced by an i_dio_count member in struct inode
    that counts the number of pending direct I/O requests. Truncate can't
    proceed as long as it's non-zero
    - when i_dio_count reaches non-zero we wake up a pending truncate using
    wake_up_bit on a new bit in i_flags
    - new references to i_dio_count can't appear while we are waiting for
    it to read zero because the direct I/O count always needs i_mutex
    (or an equivalent like XFS's i_iolock) for starting a new operation.

    This scheme is much simpler, and saves the space of a spinlock_t and a
    struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
    system).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Reject zero sized reads as soon as we know our I/O length, and don't
    borther with locks or allocations that might have to be cleaned up
    otherwise.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

10 Mar, 2011

2 commits

  • With the plugging now being explicitly controlled by the
    submitter, callers need not pass down unplugging hints
    to the block layer. If they want to unplug, it's because they
    manually plugged on their own - in which case, they should just
    unplug at will.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Feb, 2011

1 commit


21 Jan, 2011

1 commit

  • When using devices that support max_segments > BIO_MAX_PAGES (256), direct
    IO tries to allocate a bio with more pages than allowed, which leads to an
    oops in dio_bio_alloc(). Clamp the request to the supported maximum, and
    change dio_bio_alloc() to reflect that bio_alloc() will always return a
    bio when called with __GFP_WAIT and a valid number of vectors.

    [akpm@linux-foundation.org: remove redundant BUG_ON()]
    Signed-off-by: David Dillow
    Reviewed-by: Jeff Moyer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Dillow
     

19 Jan, 2011

1 commit


27 Oct, 2010

1 commit


10 Sep, 2010

1 commit

  • commit c2c6ca4 (direct-io: do not merge logically non-contiguous requests)
    introduced a bug whereby all O_DIRECT I/Os were submitted a page at a time
    to the block layer. The problem is that the code expected
    dio->block_in_file to correspond to the current page in the dio. In fact,
    it corresponds to the previous page submitted via submit_page_section.
    This was purely an oversight, as the dio->cur_page_fs_offset field was
    introduced for just this purpose. This patch simply uses the correct
    variable when calculating whether there is a mismatch between contiguous
    logical blocks and contiguous physical blocks (as described in the
    comments).

    I also switched the if conditional following this check to an else if, to
    ensure that we never call dio_bio_submit twice for the same dio (in
    theory, this should not happen, anyway).

    I've tested this by running blktrace and verifying that a 64KB I/O was
    submitted as a single I/O. I also ran the patched kernel through
    xfstests' aio tests using xfs, ext4 (with 1k and 4k block sizes) and btrfs
    and verified that there were no regressions as compared to an unpatched
    kernel.

    Signed-off-by: Jeff Moyer
    Acked-by: Josef Bacik
    Cc: Christoph Hellwig
    Cc: Chris Mason
    Cc: [2.6.35.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

10 Aug, 2010

1 commit

  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

27 Jul, 2010

1 commit

  • Filesystems with unwritten extent support must not complete an AIO request
    until the transaction to convert the extent has been commited. That means
    the aio_complete calls needs to be moved into the ->end_io callback so
    that the filesystem can control when to call it exactly.

    This makes a bit of a mess out of dio_complete and the ->end_io callback
    prototype even more complicated.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jan Kara
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

28 May, 2010

1 commit

  • Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
    setattr > vmtruncate > truncate, have filesystems call their truncate sequence
    from ->setattr if filesystem specific operations are required. vmtruncate is
    deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
    previously should be used.

    simple_setattr is introduced for simple in-ram filesystems to implement
    the new truncate sequence. Eventually all filesystems should be converted
    to implement a setattr, and the default code in notify_change should go
    away.

    simple_setsize is also introduced to perform just the ATTR_SIZE portion
    of simple_setattr (ie. changing i_size and trimming pagecache).

    To implement the new truncate sequence:
    - filesystem specific manipulations (eg freeing blocks) must be done in
    the setattr method rather than ->truncate.
    - vmtruncate can not be used by core code to trim blocks past i_size in
    the event of write failure after allocation, so this must be performed
    in the fs code.
    - convert usage of helpers block_write_begin, nobh_write_begin,
    cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
    variants. These avoid calling vmtruncate to trim blocks (see previous).
    - inode_setattr should not be used. generic_setattr is a new function
    to be used to copy simple attributes into the generic inode.
    - make use of the better opportunity to handle errors with the new sequence.

    Big problem with the previous calling sequence: the filesystem is not called
    until i_size has already changed. This means it is not allowed to fail the
    call, and also it does not know what the previous i_size was. Also, generic
    code calling vmtruncate to truncate allocated blocks in case of error had
    no good way to return a meaningful error (or, for example, atomically handle
    block deallocation).

    Cc: Christoph Hellwig
    Acked-by: Jan Kara
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

25 May, 2010

2 commits

  • Btrfs cannot handle having logically non-contiguous requests submitted. For
    example if you have

    Logical: [0-4095][HOLE][8192-12287]
    Physical: [0-4095] [4096-8191]

    Normally the DIO code would put these into the same BIO's. The problem is we
    need to know exactly what offset is associated with what BIO so we can do our
    checksumming and unlocking properly, so putting them in the same BIO doesn't
    work. So add another check where we submit the current BIO if the physical
    blocks are not contigous OR the logical blocks are not contiguous.

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • Because BTRFS can do RAID and such, we need our own submit hook so we can setup
    the bio's in the correct fashion, and handle checksum errors properly. So there
    are a few changes here

    1) The submit_io hook. This is straightforward, just call this instead of
    submit_bio.

    2) Allow the fs to return -ENOTBLK for reads. Usually this has only worked for
    writes, since writes can fallback onto buffered IO. But BTRFS needs the option
    of falling back on buffered IO if it encounters a compressed extent, since we
    need to read the entire extent in and decompress it. So if we get -ENOTBLK back
    from get_block we'll return back and fallback on buffered just like the write
    case.

    I've tested these changes with fsx and everything seems to work. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

17 Dec, 2009

1 commit


16 Dec, 2009

2 commits

  • Currently the locking in blockdev_direct_IO is a mess, we have three
    different locking types and very confusing checks for some of them. The
    most complicated one is DIO_OWN_LOCKING for reads, which happens to not
    actually be used.

    This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
    case is unused anyway, and the write side is almost identical to
    DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the
    create argument for the get_blocks callback to zero, but we can easily
    move that to the actual get_blocks callbacks. There are four users of the
    DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
    fine with the new version, ocfs2 only errors out if create were ever set,
    and we can remove this dead code now, the block device code only ever uses
    create for an error message if we are fully beyond the device which can
    never happen, and last but not least XFS will need the new behavour for
    writes.

    Now we can replace the lock_type variable with a flags one, where no flag
    means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
    flag. Separate out the check for not allowing to fill holes into a
    separate flag, although for now both flags always get set at the same
    time.

    Also revamp the documentation of the locking scheme to actually make
    sense.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Badari Pulavarty
    Cc: Jeff Moyer
    Cc: Jens Axboe
    Cc: Zach Brown
    Cc: Al Viro
    Cc: Alex Elder
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Intel reported a performance regression caused by the following commit:

    commit 848c4dd5153c7a0de55470ce99a8e13a63b4703f
    Author: Zach Brown
    Date: Mon Aug 20 17:12:01 2007 -0700

    dio: zero struct dio with kzalloc instead of manually

    This patch uses kzalloc to zero all of struct dio rather than
    manually trying to track which fields we rely on being zero. It
    passed aio+dio stress testing and some bug regression testing on
    ext3.

    This patch was introduced by Linus in the conversation that lead up
    to Badari's minimal fix to manually zero .map_bh.b_state in commit:

    6a648fa72161d1f6468dabd96c5d3c0db04f598a

    It makes the code a bit smaller. Maybe a couple fewer cachelines to
    load, if we're lucky:

    text data bss dec hex filename
    3285925 568506 1304616 5159047 4eb887 vmlinux
    3285797 568506 1304616 5158919 4eb807 vmlinux.patched

    I was unable to measure a stable difference in the number of cpu
    cycles spent in blockdev_direct_IO() when pushing aio+dio 256K reads
    at ~340MB/s.

    So the resulting intent of the patch isn't a performance gain but to
    avoid exposing ourselves to the risk of finding another field like
    .map_bh.b_state where we rely on zeroing but don't enforce it in the
    code.

    Zach surmised that zeroing out the page array was what caused most of
    the problem, and suggested the approach taken in the attached patch for
    resolving the issue. Intel re-tested with this patch and saw a 0.6%
    performance gain (the original regression was 0.5%).

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Jeff Moyer
    Acked-by: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Moyer
     

26 Nov, 2009

1 commit

  • There seems to be a regression in direct write path due to following
    commit in for-2.6.33 branch of block tree.

    commit 1af60fbd759d31f565552fea315c2033947cfbe6
    Author: Jeff Moyer
    Date: Fri Oct 2 18:56:53 2009 -0400

    block: get rid of the WRITE_ODIRECT flag

    Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets
    the NOIDLE flag in bio and hence in request. This tells CFQ to not expect
    more request from the queue and not idle on it (despite the fact that
    queue's think time is less and it is not seeky).

    So direct writers lose big time when competing with sequential readers.

    Using fio, I have run one direct writer and two sequential readers and
    following are the results with 2.6.32-rc7 kernel and with for-2.6.33
    branch.

    Test
    ====
    1 direct writer and 2 sequential reader running simultaneously.

    [global]
    directory=/mnt/sdc/fio/
    runtime=10

    [seqwrite]
    rw=write
    size=4G
    direct=1

    [seqread]
    rw=read
    size=2G
    numjobs=2

    2.6.32-rc7
    ==========
    direct writes: aggrb=2,968KB/s
    readers : aggrb=101MB/s

    for-2.6.33 branch
    =================
    direct write: aggrb=19KB/s
    readers aggrb=137MB/s

    This patch brings back the WRITE_ODIRECT flag, with the difference that we
    don't set the BIO_RW_UNPLUG flag so that device is not unplugged after
    submission of request and an explicit unplug from submitter is required.

    That way we fix the jeff's issue of not enough merging taking place in aio
    path as well as make sure direct writes get their fair share.

    After the fix
    =============
    for-2.6.33 + fix
    ----------------
    direct writes: aggrb=2,728KB/s
    reads: aggrb=103MB/s

    Thanks
    Vivek

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     

28 Oct, 2009

2 commits

  • Hi,

    Some workloads issue batches of small I/O, and the performance is poor
    due to the call to blk_run_address_space for every single iocb. Nathan
    Roberts pointed this out, and suggested that by deferring this call
    until all I/Os in the iocb array are submitted to the block layer, we
    can realize some impressive performance gains (up to 30% for sequential
    4k reads in batches of 16).

    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jeff Moyer
     
  • Hi,

    The WRITE_ODIRECT flag is only used in one place, and that code path
    happens to also call blk_run_address_space. The introduction of this
    flag, then, could result in the device being unplugged twice for every
    I/O.

    Further, with the batching changes in the next patch, we don't want an
    O_DIRECT write to imply a queue unplug.

    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

23 May, 2009

1 commit

  • Until now we have had a 1:1 mapping between storage device physical
    block size and the logical block sized used when addressing the device.
    With SATA 4KB drives coming out that will no longer be the case. The
    sector size will be 4KB but the logical block size will remain
    512-bytes. Hence we need to distinguish between the physical block size
    and the logical ditto.

    This patch renames hardsect_size to logical_block_size.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

15 Apr, 2009

1 commit


06 Apr, 2009

1 commit

  • By default, CFQ will anticipate more IO from a given io context if the
    previously completed IO was sync. This used to be fine, since the only
    sync IO was reads and O_DIRECT writes. But with more "normal" sync writes
    being used now, we don't want to anticipate for those.

    Add a bio/request flag that informs the IO scheduler that this is a sync
    request that we should not idle for. Introduce WRITE_ODIRECT specifically
    for O_DIRECT writes, and make sure that the other sync writes set this
    flag.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

07 Jan, 2009

1 commit

  • In case of error extending write may have instantiated a few blocks
    outside i_size. We need to trim these blocks. We have to do it
    *regardless* to blocksize. At least ext2, ext3 and reiserfs interpret
    (i_size < biggest block) condition as error. Fsck will complain about
    wrong i_size. Then fsck will fix the error by changing i_size according
    to the biggest block. This is bad because this blocks contain garbage
    from previous write attempt. And result in data corruption.

    ####TESTCASE_BEGIN
    $touch /mnt/test/BIG_FILE
    ## at this moment /mnt/test/BIG_FILE size and blocks equal to zero
    open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
    write(3, "aaaaaaaaaaaa"..., 104857600) = -1 ENOSPC (No space left on device)
    ## size and block sould't be changed because write op failed.
    $stat /mnt/test/BIG_FILE
    File: `/mnt/test/BIG_FILE'
    Size: 0 Blocks: 110896 IO Block: 1024 regular empty file
    <<<<<<<? yes
    Pass 2: Checking directory structure
    ....
    #####TESTCASE_ENDdiff --git a/fs/direct-io.c b/fs/direct-io.c
    index af0558d..4e88bea 100644

    [akpm@linux-foundation.org: use i_size_read()]
    Signed-off-by: Dmitri Monakhov
    Cc: Zach Brown
    Cc: Nick Piggin
    Cc: Badari Pulavarty
    Cc: Chris Mason
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitri Monakhov
     

17 Oct, 2008

1 commit


27 Jul, 2008

1 commit

  • Use get_user_pages_fast in the common/generic block and fs direct IO paths.

    Signed-off-by: Nick Piggin
    Cc: Dave Kleikamp
    Cc: Andy Whitcroft
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Dave Kleikamp
    Cc: Badari Pulavarty
    Cc: Zach Brown
    Cc: Jens Axboe
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

06 Feb, 2008

1 commit

  • Simplify page cache zeroing of segments of pages through 3 functions

    zero_user_segments(page, start1, end1, start2, end2)

    Zeros two segments of the page. It takes the position where to
    start and end the zeroing which avoids length calculations and
    makes code clearer.

    zero_user_segment(page, start, end)

    Same for a single segment.

    zero_user(page, start, length)

    Length variant for the case where we know the length.

    We remove the zero_user_page macro. Issues:

    1. Its a macro. Inline functions are preferable.

    2. The KM_USER0 macro is only defined for HIGHMEM.

    Having to treat this special case everywhere makes the
    code needlessly complex. The parameter for zeroing is always
    KM_USER0 except in one single case that we open code.

    Avoiding KM_USER0 makes a lot of code not having to be dealing
    with the special casing for HIGHMEM anymore. Dealing with
    kmap is only necessary for HIGHMEM configurations. In those
    configurations we use KM_USER0 like we do for a series of other
    functions defined in highmem.h.

    Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
    function could not be a macro. zero_user_* functions introduced
    here can be be inline because that constant is not used when these
    functions are called.

    Also extract the flushing of the caches to be outside of the kmap.

    [akpm@linux-foundation.org: fix nfs and ntfs build]
    [akpm@linux-foundation.org: fix ntfs build some more]
    Signed-off-by: Christoph Lameter
    Cc: Steven French
    Cc: Michael Halcrow
    Cc:
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: Michael Halcrow
    Cc: Steven French
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter