11 Sep, 2013

6 commits

  • This patch adds the missing call to list_lru_destroy (spotted by Li Zhong)
    and moves the deletion to after the shrinker is unregistered, as correctly
    spotted by Dave

    Signed-off-by: Glauber Costa
    Cc: Michal Hocko
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     
  • We currently use a compile-time constant to size the node array for the
    list_lru structure. Due to this, we don't need to allocate any memory at
    initialization time. But as a consequence, the structures that contain
    embedded list_lru lists can become way too big (the superblock for
    instance contains two of them).

    This patch aims at ameliorating this situation by dynamically allocating
    the node arrays with the firmware provided nr_node_ids.

    Signed-off-by: Glauber Costa
    Cc: Dave Chinner
    Cc: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     
  • The new LRU list isolation code in xfs_qm_dquot_isolate() isn't
    completely up to date. Firstly, it needs conversion to return enum
    lru_status values, not raw numbers. Secondly - most importantly - it
    fails to unlock the dquot and relock the LRU in the LRU_RETRY path.
    This leads to deadlocks in xfstests generic/232. Fix them.

    Signed-off-by: Dave Chinner
    Cc: Glauber Costa
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Chinner
     
  • fix warnings

    Cc: Dave Chinner
    Cc: Glauber Costa
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Andrew Morton
     
  • Convert the XFS dquot lru to use the list_lru construct and convert the
    shrinker to being node aware.

    [glommer@openvz.org: edited for conflicts + warning fixes]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     
  • The sysctl knob sysctl_vfs_cache_pressure is used to determine which
    percentage of the shrinkable objects in our cache we should actively try
    to shrink.

    It works great in situations in which we have many objects (at least more
    than 100), because the aproximation errors will be negligible. But if
    this is not the case, specially when total_objects < 100, we may end up
    concluding that we have no objects at all (total / 100 = 0, if total <
    100).

    This is certainly not the biggest killer in the world, but may matter in
    very low kernel memory situations.

    Signed-off-by: Glauber Costa
    Reviewed-by: Carlos Maiolino
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Mel Gorman
    Cc: Dave Chinner
    Cc: Al Viro
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Glauber Costa
     

16 Aug, 2013

1 commit


13 Aug, 2013

3 commits

  • With the new xfs_trans_res structure has been introduced, the log
    reservation size, log count as well as log flags are pre-initialized
    at mount time. So it's time to refine xfs_trans_reserve() interface
    to be more neat.

    Also, introduce a new helper M_RES() to return a pointer to the
    mp->m_resv structure to simplify the input.

    Signed-off-by: Jie Liu
    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Jie Liu
     
  • There are a few small helper functions in xfs_util, all related to
    xfs_inode modifications. Move them all to xfs_inode.c so all
    xfs_inode operations are consiolidated in the one place.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The on disk format definitions of the on-disk dquot, log formats and
    quota off log formats are all intertwined with other definitions for
    quotas. Separate them out into their own header file so they can
    easily be shared with userspace.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

23 Jul, 2013

1 commit


11 Jul, 2013

1 commit

  • Add project quota changes to all the places where group quota field
    is used:
    * add separate project quota members into various structures
    * split project quota and group quotas so that instead of overriding
    the group quota members incore, the new project quota members are
    used instead
    * get rid of usage of the OQUOTA flag incore, in favor of separate
    group and project quota flags.
    * add a project dquot argument to various functions.

    Not using the pquotino field from superblock yet.

    Signed-off-by: Chandra Seetharaman
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Chandra Seetharaman
     

29 Jun, 2013

4 commits


05 Jun, 2013

1 commit

  • Calculating dquot CRCs when the backing buffer is written back just
    doesn't work reliably. There are several places which manipulate
    dquots directly in the buffers, and they don't calculate CRCs
    appropriately, nor do they always set the buffer up to calculate
    CRCs appropriately.

    Firstly, if we log a dquot buffer (e.g. during allocation) it gets
    logged without valid CRC, and so on recovery we end up with a dquot
    that is not valid.

    Secondly, if we recover/repair a dquot, we don't have a verifier
    attached to the buffer and hence CRCs are not calculated on the way
    down to disk.

    Thirdly, calculating the CRC after we've changed the contents means
    that if we re-read the dquot from the buffer, we cannot verify the
    contents of the dquot are valid, as the CRC is invalid.

    So, to avoid all the dquot CRC errors that are being detected by the
    read verifier, change to using the same model as for inodes. That
    is, dquot CRCs are calculated and written to the backing buffer at
    the time the dquot is flushed to the backing buffer. If we modify
    the dquot directly in the backing buffer, calculate the CRC
    immediately after the modification is complete. Hence the dquot in
    the on-disk buffer should always have a valid CRC.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     

22 Apr, 2013

1 commit

  • Use the reserved space in struct xfs_dqblk to store a UUID and a crc
    for the quota blocks.

    [dchinner@redhat.com] Add a LSN field and update for current verifier
    infrastructure.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

23 Mar, 2013

1 commit


02 Feb, 2013

1 commit


30 Nov, 2012

1 commit

  • When we fail to get a dquot lock during reclaim, we jump to an error
    handler that unlocks the dquot. This is wrong as we didn't lock the
    dquot, and unlocking it means who-ever is holding the lock has had
    it silently taken away, and hence it results in a lock imbalance.

    Found by inspection while modifying the code for the numa-lru
    patchset. This fixes a random hang I've been seeing on xfstest 232
    for the past several months.

    cc:
    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

16 Nov, 2012

4 commits

  • To separate the verifiers from iodone functions and associate read
    and write verifiers at the same time, introduce a buffer verifier
    operations structure to the xfs_buf.

    This avoids the need for assigning the write verifier, clearing the
    iodone function and re-running ioend processing in the read
    verifier, and gets rid of the nasty "b_pre_io" name for the write
    verifier function pointer. If we ever need to, it will also be
    easier to add further content specific callbacks to a buffer with an
    ops structure in place.

    We also avoid needing to export verifier functions, instead we
    can simply export the ops structures for those that are needed
    outside the function they are defined in.

    This patch also fixes a directory block readahead verifier issue
    it exposed.

    This patch also adds ops callbacks to the inode/alloc btree blocks
    initialised by growfs. These will need more work before they will
    work with CRCs.

    Signed-off-by: Dave Chinner
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • These verifiers are essentially the same code as the read verifiers,
    but do not require ioend processing. Hence factor the read verifier
    functions and add a new write verifier wrapper that is used as the
    callback.

    This is done as one large patch for all verifiers rather than one
    patch per verifier as the change is largely mechanical. This
    includes hooking up the write verifier via the read verifier
    function.

    Hooking up the write verifier for buffers obtained via
    xfs_trans_get_buf() will be done in a separate patch as that touches
    code in many different places rather than just the verifier
    functions.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Add a dquot buffer verify callback function and pass it into the
    buffer read functions. This checks all the dquots in a buffer, but
    cannot completely verify the dquot ids are correct. Also, errors
    cannot be repaired, so an additional function is added to repair bad
    dquots in the buffer if such an error is detected in a context where
    repair is allowed.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Add a verifier function callback capability to the buffer read
    interfaces. This will be used by the callers to supply a function
    that verifies the contents of the buffer when it is read from disk.
    This patch does not provide callback functions, but simply modifies
    the interfaces to allow them to be called.

    The reason for adding this to the read interfaces is that it is very
    difficult to tell fom the outside is a buffer was just read from
    disk or whether we just pulled it out of cache. Supplying a callbck
    allows the buffer cache to use it's internal knowledge of the buffer
    to execute it only when the buffer is read from disk.

    It is intended that the verifier functions will mark the buffer with
    an EFSCORRUPTED error when verification fails. This allows the
    reading context to distinguish a verification error from an IO
    error, and potentially take further actions on the buffer (e.g.
    attempt repair) based on the error reported.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     

18 Oct, 2012

1 commit

  • The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c
    along with the other inode cache functions. This removes all functionality from
    xfs_iget.c, so the file can simply be removed.

    This move results in various functions now only having the scope of a single
    file (e.g. xfs_inode_free()), so clean up all the definitions and exported
    prototypes in xfs_icache.[ch] and xfs_inode.h appropriately.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

15 Jun, 2012

1 commit

  • XFS_MAXIOFFSET() is just a simple macro that resolves to
    mp->m_maxioffset. It doesn't need to exist, and it just makes the
    code unnecessarily loud and shouty.

    Make it quiet and easy to read.

    Signed-off-by: Dave Chinner
    Reviewed-by: Eric Sandeen
    Signed-off-by: Ben Myers

    Dave Chinner
     

15 May, 2012

4 commits

  • Untangle the header file includes a bit by moving the definition of
    xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
    xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
    xfs_ag.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
    and write back the buffers per-process instead of by waking up xfsbufd.

    This is now easily doable given that we have very few places left that write
    delwri buffers:

    - log recovery:
    Only done at mount time, and already forcing out the buffers
    synchronously using xfs_flush_buftarg

    - quotacheck:
    Same story.

    - dquot reclaim:
    Writes out dirty dquots on the LRU under memory pressure. We might
    want to look into doing more of this via xfsaild, but it's already
    more optimal than the synchronous inode reclaim that writes each
    buffer synchronously.

    - xfsaild:
    This is the main beneficiary of the change. By keeping a local list
    of buffers to write we reduce latency of writing out buffers, and
    more importably we can remove all the delwri list promotions which
    were hitting the buffer cache hard under sustained metadata loads.

    The implementation is very straight forward - xfs_buf_delwri_queue now gets
    a new list_head pointer that it adds the delwri buffers to, and all callers
    need to eventually submit the list using xfs_buf_delwi_submit or
    xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are
    skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
    list. The biggest change to pass down the buffer list was done to the AIL
    pushing. Now that we operate on buffers the trylock, push and pushbuf log
    item methods are merged into a single push routine, which tries to lock the
    item, and if possible add the buffer that needs writeback to the buffer list.
    This leads to much simpler code than the previous split but requires the
    individual IOP_PUSH instances to unlock and reacquire the AIL around calls
    to blocking routines.

    Given that xfsailds now also handle writing out buffers, the conditions for
    log forcing and the sleep times needed some small changes. The most
    important one is that we consider an AIL busy as long we still have buffers
    to push, and the other one is that we do increment the pushed LSN for
    buffers that are under flushing at this moment, but still count them towards
    the stuck items for restart purposes. Without this we could hammer on stuck
    items without ever forcing the log and not make progress under heavy random
    delete workloads on fast flash storage devices.

    [ Dave Chinner:
    - rebase on previous patches.
    - improved comments for XBF_DELWRI_Q handling
    - fix XBF_ASYNC handling in queue submission (test 106 failure)
    - rename delwri submit function buffer list parameters for clarity
    - xfs_efd_item_push() should return XFS_ITEM_PINNED ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Instead of writing the buffer directly from inside xfs_qm_dqflush return it
    to the caller and let the caller decide what to do with the buffer. Also
    remove the pincount check in xfs_qm_dqflush that all non-blocking callers
    already implement and the now unused flags parameter and the XFS_DQ_IS_DIRTY
    check that all callers already perform.

    [ Dave Chinner: fixed build error cause by missing '{'. ]

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Check if we actually need to attach a dquot before taking the ilock in
    xfs_qm_dqattach. This avoid superflous lock roundtrips for the common cases
    of quota support compiled in but not activated on a filesystem and an
    inode that already has the dquots attached.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

15 Mar, 2012

5 commits

  • If we initialize the slab caches for the quota code when XFS is loaded there
    is no need for a global and reference counted quota manager structure. Drop
    all this overhead and also fix the error handling during quota initialization.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Instead of keeping a separate per-filesystem list of dquots we can walk
    the radix tree for the two places where we need to iterate all quota
    structures.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Replace the global hash tables for looking up in-memory dquot structures
    with per-filesystem radix trees to allow scaling to a large number of
    in-memory dquot structures.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Replace the global dquot lru lists with a per-filesystem one.

    Note that the shrinker isn't wire up to the per-superblock VFS shrinker
    infrastructure as would have problems summing up and splitting the counts
    for inodes and dquots. I don't think this is a major problem as the quota
    cache isn't as interwinded with the inode cache as the dentry cache is,
    because an inode that is dropped from the cache will generally release
    a dquot reference, but most of the time it won't be the last one.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     
  • Switch the quota code over to use the generic XFS statistics infrastructure.
    While the legacy /proc/fs/xfs/xqm and /proc/fs/xfs/xqmstats interfaces are
    preserved for now the statistics that still have a meaning with the current
    code are now also available from /proc/fs/xfs/stats.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

11 Feb, 2012

1 commit

  • Stop reusing dquots from the freelist when allocating new ones directly, and
    implement a shrinker that actually follows the specifications for the
    interface. The shrinker implementation is still highly suboptimal at this
    point, but we can gradually work on it.

    This also fixes an bug in the previous lock ordering, where we would take
    the hash and dqlist locks inside of the freelist lock against the normal
    lock ordering. This is only solvable by introducing the dispose list,
    and thus not when using direct reclaim of unused dquots for new allocations.

    As a side-effect the quota upper bound and used to free ratio values in
    /proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
    new world order.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    (cherry picked from commit 04da0c8196ac0b12fb6b84f4b7a51ad2fa56d869)

    Christoph Hellwig
     

04 Feb, 2012

1 commit


17 Dec, 2011

1 commit


16 Dec, 2011

1 commit

  • Just read the id 0 dquot from disk directly in xfs_qm_init_quotainfo instead
    of going through dqget and requiring a special flag to not add the dquot to
    any lists.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Christoph Hellwig