30 Aug, 2012

1 commit

  • While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
    buffers from lru list), there is a possibility to have xfs_buf_stale() racing
    with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
    it.

    This happens because xfs_buftarg_shrink() handle the dispose list without
    locking and the test condition in xfs_buf_stale() checks for the buffer being in
    *any* list:

    if (!list_empty(&bp->b_lru))

    If the buffer happens to be on dispose list, this causes the buffer counter of
    lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
    and another in xfs_buf_stale()) causing a wrong account usage of the lru list.

    This may cause xfs_buftarg_shrink() to return a wrong value to the memory
    shrinker shrink_slab(), and such account error may also cause an underflowed
    value to be returned; since the counter is lower than the current number of
    items in the lru list, a decrement may happen when the counter is 0, causing
    an underflow on the counter.

    The fix uses a new flag field (and a new buffer flag) to serialize buffer
    handling during the shrink process. The new flag field has been designed to use
    btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.

    dchinner, sandeen, aquini and aris also deserve credits for this.

    Signed-off-by: Carlos Maiolino
    Reviewed-by: Ben Myers
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Carlos Maiolino
     

14 Jul, 2012

1 commit

  • xfs_bdstrat_cb only adds a check for a shutdown filesystem over
    xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
    filesystem a little earlier. In addition the shutdown handling in
    xfs_bdstrat_cb is not very suitable for this caller.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

02 Jul, 2012

3 commits

  • With the internal interfaces supporting discontiguous buffer maps,
    add external lookup, read and get interfaces so they can start to be
    used.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • While the external interface currently uses separate blockno/length
    variables, we need to move internal interfaces to passing and
    parsing vector maps. This will then allow us to add external
    interfaces to support discontiguous buffer maps as the internal code
    will already support them.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • To support discontiguous buffers in the buffer cache, we need to
    separate the cache index variables from the I/O map. While this is
    currently a 1:1 mapping, discontiguous buffer support will break
    this relationship.

    However, for caching purposes, we can still treat them the same as a
    contiguous buffer - the block number of the first block and the
    length of the buffer - as that is still a unique representation.
    Also, the only way we will ever access the discontiguous regions of
    buffers is via bulding the complete buffer in the first place, so
    using the initial block number and entire buffer length is a sane
    way to index the buffers.

    Add a block mapping vector construct to the xfs_buf and use it in
    the places where we are doing IO instead of the current
    b_bn/b_length variables.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

15 May, 2012

10 commits

  • Rather than specifying XBF_MAPPED for almost all buffers, introduce
    XBF_UNMAPPED for the couple of users that use unmapped buffers.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
    This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
    avoid recursion through memory reclaim back into the filesystem.

    All the blocking get calls in growfs occur inside a transaction, even though
    they are no part of the transaction, so all allocation will be GFP_NOFS due to
    the task flag PF_TRANS being set. The blocking read calls occur during log
    recovery, so they will probably be unaffected by converting to GFP_NOFS
    allocations.

    Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Buffers are always returned locked from the lookup routines. Hence
    we don't need to tell the lookup routines to return locked buffers,
    on to try and lock them. Remove XBF_LOCK from all the callers and
    from internal buffer cache usage.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_buf_btoc and friends are simple macros that do basic block
    to page index conversion and vice versa. These aren't widely used,
    and we use open coded masking and shifting everywhere else. Hence
    remove the macros and open code the work they do.

    Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now
    incorrect - we are using pages directly and not the page cache, so
    use PAGE_{SIZE|MASK|SHIFT} instead.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Now that we pass block counts everywhere, and index buffers by block
    number and length in units of blocks, convert the desired IO size
    into block counts rather than bytes. Convert the code to use block
    counts, and those that need byte counts get converted at the time of
    use.

    Rename the b_desired_count variable to something closer to it's
    purpose - b_io_length - as it is only used to specify the length of
    an IO for a subset of the buffer. The only time this is used is for
    log IO - both writing iclogs and during log recovery. In all other
    cases, the b_io_length matches b_length, and hence a lot of code
    confuses the two. e.g. the buf item code uses the io count
    exclusively when it should be using the buffer length. Fix these
    apprpriately as they are found.

    Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers
    around the desired IO length. They only serve to make the code
    shouty loud, don't actually add any real value, and are often used
    incorrectly.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Now that we pass block counts everywhere, and index buffers by block
    number, track the length of the buffer in units of blocks rather
    than bytes. Convert the code to use block counts, and those that
    need byte counts get converted at the time of use.

    Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers
    around the buffer length. They only serve to make the code shouty
    loud and don't actually add any real value.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Seeing as we pass block numbers around everywhere in the buffer
    cache now, it makes no sense to index everything by byte offset.
    Replace all the byte offset indexing with block number based
    indexing, and replace all uses of the byte offset with direct
    conversion from the block index.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The xfs_buf_get/read API is not consistent in the units it uses, and
    does not use appropriate or consistent units/types for the
    variables.

    Convert the API to use disk addresses and block counts for all
    buffer get and read calls. Use consistent naming for all the
    functions and their declarations, and convert the internal functions
    to use disk addresses and block counts to avoid need to convert them
    from one type to another and back again.

    Fix all the callers to use disk addresses and block counts. In many
    cases, this removes an additional conversion from the function call
    as the callers already have a block count.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • If we call xfs_buf_iowait() on a buffer that failed dispatch due to
    an IO error, it will wait forever for an Io that does not exist.
    This is hndled in xfs_buf_read, but there is other code that calls
    xfs_buf_iowait directly that doesn't.

    Rather than make the call sites have to handle checking for dispatch
    errors and then checking for completion errors, make
    xfs_buf_iowait() check for dispatch errors on the buffer before
    waiting. This means we handle both dispatch and completion errors
    with one set of error handling at the caller sites.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
    and write back the buffers per-process instead of by waking up xfsbufd.

    This is now easily doable given that we have very few places left that write
    delwri buffers:

    - log recovery:
    Only done at mount time, and already forcing out the buffers
    synchronously using xfs_flush_buftarg

    - quotacheck:
    Same story.

    - dquot reclaim:
    Writes out dirty dquots on the LRU under memory pressure. We might
    want to look into doing more of this via xfsaild, but it's already
    more optimal than the synchronous inode reclaim that writes each
    buffer synchronously.

    - xfsaild:
    This is the main beneficiary of the change. By keeping a local list
    of buffers to write we reduce latency of writing out buffers, and
    more importably we can remove all the delwri list promotions which
    were hitting the buffer cache hard under sustained metadata loads.

    The implementation is very straight forward - xfs_buf_delwri_queue now gets
    a new list_head pointer that it adds the delwri buffers to, and all callers
    need to eventually submit the list using xfs_buf_delwi_submit or
    xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are
    skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
    list. The biggest change to pass down the buffer list was done to the AIL
    pushing. Now that we operate on buffers the trylock, push and pushbuf log
    item methods are merged into a single push routine, which tries to lock the
    item, and if possible add the buffer that needs writeback to the buffer list.
    This leads to much simpler code than the previous split but requires the
    individual IOP_PUSH instances to unlock and reacquire the AIL around calls
    to blocking routines.

    Given that xfsailds now also handle writing out buffers, the conditions for
    log forcing and the sleep times needed some small changes. The most
    important one is that we consider an AIL busy as long we still have buffers
    to push, and the other one is that we do increment the pushed LSN for
    buffers that are under flushing at this moment, but still count them towards
    the stuck items for restart purposes. Without this we could hammer on stuck
    items without ever forcing the log and not make progress under heavy random
    delete workloads on fast flash storage devices.

    [ Dave Chinner:
    - rebase on previous patches.
    - improved comments for XBF_DELWRI_Q handling
    - fix XBF_ASYNC handling in queue submission (test 106 failure)
    - rename delwri submit function buffer list parameters for clarity
    - xfs_efd_item_push() should return XFS_ITEM_PINNED ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

29 Mar, 2012

1 commit


17 Dec, 2011

1 commit


12 Oct, 2011

12 commits

  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • The calling convention that returns a pointer to a static buffer is
    fairly nasty, so just opencode it in the only caller that is left.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Instead of passing the block number and mount structure explicitly
    get them off the bp and fix make the argument order more natural.

    Also move it to xfs_buf.c and stop printing the device name given
    that we already get the fs name as part of xfs_alert, and we know
    what device is operates on because of the caller that gets printed,
    finally rename it to xfs_buf_ioerror_alert and pass __func__ as
    argument where it makes sense.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Change _xfs_buf_initialize to allocate the buffer directly and rename it to
    xfs_buf_alloc now that is the only buffer allocation routine. Also remove
    the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • The code is unused and under a config option that doesn't exist, remove it.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • The code to flush buffers in the umount code is a bit iffy: we first
    flush all delwri buffers out, but then might be able to queue up a
    new one when logging the sb counts. On a normal shutdown that one
    would get flushed out when doing the synchronous superblock write in
    xfs_unmountfs_writesb, but we skip that one if the filesystem has
    been shut down.

    Fix this by moving the delwri list flushing until just before unmounting
    the log, and while we're at it also remove the superflous delwri list
    and buffer lru flusing for the rt and log device that can never have
    cached or delwri buffers.

    Signed-off-by: Christoph Hellwig
    Reported-by: Amit Sahrawat
    Tested-by: Amit Sahrawat
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • And also remove the strange local lock and delwri list pointers in a few
    functions.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to
    mirror the delwri and read paths.

    Also remove the mount pointer passed to xfs_bwrite, which is superflous now
    that we have a mount pointer in the buftarg.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • Unify the ways we add buffers to the delwri queue by always calling
    xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and
    opencoded in its callers, and the two places setting XBF_DELWRI while a
    buffer is locked and expecting xfs_buf_unlock to pick it up are converted
    to call xfs_buf_delwri_queue directly, too. Also replace the
    XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue
    to make the explicit queuing/dequeuing more obvious.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

13 Aug, 2011

1 commit

  • Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
    annoying subdirectories in the XFS source code. Besides the large
    amount of file rename the only changes are to the Makefile, a few
    files including headers with the subdirectory prefix, and the binary
    sysctl compat code that includes a header under fs/xfs/ from
    kernel/.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig