07 Nov, 2014

4 commits

  • There are a bunch of variables tha tare more wildy scoped than they
    need to be, obfuscated user buffer checks and tortured "next inode"
    tracking. This all needs cleaning up to expose the real issues that
    need fixing.

    cc: # 3.17
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The loop construct has issues:
    - clustidx is completely unused, so remove it.
    - the loop tries to be smart by terminating when the
    "freecount" tells it that all inodes are free. Just drop
    it as in most cases we have to scan all inodes in the
    chunk anyway.
    - move the "user buffer left" condition check to the only
    point where we consume space int eh user buffer.
    - move the initialisation of agino out of the loop, leaving
    just a simple loop control logic using the clusteridx.

    Also, double handling of the user buffer variables leads to problems
    tracking the current state - use the cursor variables directly
    rather than keeping local copies and then having to update the
    cursor before returning.

    cc: # 3.17
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The xfs_bulkstat_agichunk formatting cursor takes buffer values from
    the main loop and passes them via the structure to the chunk
    formatter, and the writes the changed values back into the main loop
    local variables. Unfortunately, this complex dance is full of corner
    cases that aren't handled correctly.

    The biggest problem is that it is double handling the information in
    both the main loop and the chunk formatting function, leading to
    inconsistent updates and endless loops where progress is not made.

    To fix this, push the struct xfs_bulkstat_agichunk outwards to be
    the primary holder of user buffer information. this removes the
    double handling in the main loop.

    Also, pass the last inode processed by the chunk formatter as a
    separate parameter as it purely an output variable and is not
    related to the user buffer consumption cursor.

    Finally, the chunk formatting code is not shared by anyone, so make
    it local to xfs_itable.c.

    cc: # 3.17
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     
  • The bulkstat code has several different ways of detecting the end of
    an AG when doing a walk. They are not consistently detected, and the
    code that checks for the end of AG conditions is not consistently
    coded. Hence the are conditions where the walk code can get stuck in
    an endless loop making no progress and not triggering any
    termination conditions.

    Convert all the "tmp/i" status return codes from btree operations
    to a common name (stat) and apply end-of-ag detection to these
    operations consistently.

    cc: # 3.17
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

30 Oct, 2014

1 commit

  • xfs_bulkstat() doesn't check error return from xfs_btree_increment(). In
    case of specific fs corruption that could result in xfs_bulkstat()
    entering an infinite loop because we would be looping over the same
    chunk over and over again. Fix the problem by checking the return value
    and terminating the loop properly.

    Coverity-id: 1231338
    cc:
    Signed-off-by: Jan Kara
    Reviewed-by: Jie Liu
    Signed-off-by: Dave Chinner

    Jan Kara
     

29 Oct, 2014

1 commit

  • The recent refactoring of the bulkstat code left a small landmine in
    the code. If a inobt read fails, then the tree walk is aborted and
    returns without releasing the AGI buffer or freeing the cursor. This
    can lead to a subsequent bulkstat call hanging trying to grab the
    AGI buffer again.

    cc:
    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Eric Sandeen
    Signed-off-by: Dave Chinner

    Dave Chinner
     

13 Oct, 2014

1 commit

  • caused a regression in xfs_inumbers, which in turn broke
    xfsdump, causing incomplete dumps.

    The loop in xfs_inumbers() needs to fill the user-supplied
    buffers, and iterates via xfs_btree_increment, reading new
    ags as needed.

    But the first time through the loop, if xfs_btree_increment()
    succeeds, we continue, which triggers the ++agno at the bottom
    of the loop, and we skip to soon to the next ag - without
    the proper setup under next_ag to read the next ag.

    Fix this by removing the agno increment from the loop conditional,
    and only increment agno if we have actually hit the code under
    the next_ag: target.

    Cc: stable@vger.kernel.org
    Signed-off-by: Eric Sandeen
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

04 Aug, 2014

1 commit

  • Introduce xfs_bulkstat_ag_ichunk() to process inodes in chunk with a
    pointer to a formatter function that will iget the inode and fill in
    the appropriate structure.

    Refactor xfs_bulkstat() with it.

    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Jie Liu
     

24 Jul, 2014

8 commits

  • From: Jie Liu

    Introduce xfs_bulkstat_grab_ichunk() to look up an inode chunk in where
    the given inode resides, then grab the record. Update the data for the
    pointed-to record if the inode was not the last in the chunk and there
    are some left allocated, return the grabbed inode count on success.

    Refactor xfs_bulkstat() with it.

    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Jie Liu
     
  • From: Jie Liu

    Introduce xfs_bulkstat_ichunk_ra() to loop over all clusters in the
    next inode chunk, then performs readahead if there are any allocated
    inodes in that cluster.

    Refactor xfs_bulkstat() with it.

    Signed-off-by: Jie Liu
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Jie Liu
     
  • From: Jie Liu

    We should not ignore the btree operation errors at xfs_bulkstat() but
    to propagate them if any. This patch fix two places in this function
    and the remaining things will be fixed with code refactoring thereafter.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Jie Liu
     
  • From: Jie Liu

    Remove the redundant user buffer and count checks as it has already
    been validated at xfs_ioc_bulkstat().

    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Jie Liu
     
  • From: Jie Liu

    To fetch the file system number tables, we currently just ignore the
    errors and proceed to loop over the next AG or bump agino to the next
    chunk in case of btree operations failed, that is not properly because
    those errors might hint us potential file system problems.

    This patch rework xfs_inumbers() to handle the btree operation errors
    as well as the loop conditions.

    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Jie Liu
     
  • From: Jie Liu

    Consolidate xfs_inumbers() to make the formatter function return correct
    error and make the source code looks a bit neat.

    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Jie Liu
     
  • From: Christoph Hellwig

    xfs_bukstat_one doesn't have any failure case that would go away when
    called through xfs_bulkstat, so remove the fallback and the now unessecary
    xfs_bulkstat_single function.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jie Liu
    Signed-off-by: Jie Liu
    Signed-off-by: Dave Chinner

    Christoph Hellwig
     
  • From: Jie Liu

    Remove the redundant BULKSTAT_RV_NOTHING assignment in case of call
    xfs_iget() failed at xfs_bulkstat_one_int().

    Signed-off-by: Jie Liu
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Jie Liu
     

25 Jun, 2014

1 commit

  • Convert all the errors the core XFs code to negative error signs
    like the rest of the kernel and remove all the sign conversion we
    do in the interface layers.

    Errors for conversion (and comparison) found via searches like:

    $ git grep " E" fs/xfs
    $ git grep "return E" fs/xfs
    $ git grep " E[A-Z].*;$" fs/xfs

    Negation points found via searches like:

    $ git grep "= -[a-z,A-Z]" fs/xfs
    $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
    $ git grep " -[a-z].*;" fs/xfs

    [ with some bits I missed from Brian Foster ]

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Dave Chinner

    Dave Chinner
     

22 Jun, 2014

1 commit

  • XFS_ERROR was designed long ago to trap return values, but it's not
    runtime configurable, it's not consistently used, and we can do
    similar error trapping with ftrace scripts and triggers from
    userspace.

    Just nuke XFS_ERROR and associated bits.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Dave Chinner
    Signed-off-by: Dave Chinner

    Eric Sandeen
     

24 Apr, 2014

1 commit

  • The introduction of the free inode btree (finobt) requires that
    xfs_ialloc_btree.c handle multiple trees. Refactor xfs_ialloc_btree.c
    so the caller specifies the btree type on cursor initialization to
    prepare for addition of the finobt.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Dave Chinner

    Brian Foster
     

13 Dec, 2013

2 commits


24 Oct, 2013

3 commits

  • Currently the xfs_inode.h header has a dependency on the definition
    of the BMAP btree records as the inode fork includes an array of
    xfs_bmbt_rec_host_t objects in it's definition.

    Move all the btree format definitions from xfs_btree.h,
    xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
    xfs_format.h to continue the process of centralising the on-disk
    format definitions. With this done, the xfs inode definitions are no
    longer dependent on btree header files.

    The enables a massive culling of unnecessary includes, with close to
    200 #include directives removed from the XFS kernel code base.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • xfs_trans.h has a dependency on xfs_log.h for a couple of
    structures. Most code that does transactions doesn't need to know
    anything about the log, but this dependency means that they have to
    include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
    files and clean up the includes to be in dependency order.

    In doing this, remove the direct include of xfs_trans_reserve.h from
    xfs_trans.h so that we remove the dependency between xfs_trans.h and
    xfs_mount.h. Hence the xfs_trans.h include can be moved to the
    indicate the actual dependencies other header files have on it.

    Note that these are kernel only header files, so this does not
    translate to any userspace changes at all.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • All of the buffer operations structures are needed to be exported
    for xfs_db, so move them all to a common location rather than
    spreading them all over the place. They are verifying the on-disk
    format, so while xfs_format.h might be a good place, it is not part
    of the on disk format.

    Hence we need to create a new header file that we centralise these
    related definitions. Start by moving the bffer operations
    structures, and then also move all the other definitions that have
    crept into xfs_log_format.h and xfs_format.h as there was no other
    shared header file to put them in.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

11 Sep, 2013

1 commit

  • We have quite a few places now where we do:

    x = kmem_zalloc(large size)
    if (!x)
    x = kmem_zalloc_large(large size)

    and do a similar dance when freeing the memory. kmem_free() already
    does the correct freeing dance, and kmem_zalloc_large() is only ever
    called in these constructs, so just factor it all into
    kmem_zalloc_large() and kmem_free().

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

10 Sep, 2013

1 commit


10 Jul, 2013

1 commit

  • There are some unused codes at xfs_bulkstat():

    - Variable bp is defined to point to the on-disk inode cluster
    buffer, but it proved to be of no practical help.

    - We process the chunks of good inodes which were fetched by iterating
    btree records from an AG. When processing inodes from each chunk,
    the code recomputing agbno if run into the first inode of a cluster,
    however, the agbno is not being used thereafter.

    This patch tries to clean up those things.

    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Jie Liu
     

29 Jun, 2013

1 commit


28 Jun, 2013

1 commit

  • I was running some tests on bulkstat on CRC enabled filesystems when
    I noticed that all the IO being issued was 8k in size, regardless of
    the fact taht we are issuing sequential 8k buffers for inodes
    clusters. The IO size should be 16k for 256 byte inodes, and 32k for
    512 byte inodes, but this wasn't happening.

    blktrace showed that there was an explict plug and unplug happening
    around each readahead IO from _xfs_buf_ioapply, and the unplug was
    causing the IO to be issued immediately. Hence no opportunity was
    being given to the elevator to merge adjacent readahead requests and
    dispatch them as a single IO.

    Add plugging around the inode chunk readahead dispatch loop in
    bulkstat to ensure that we don't unplug the queue between adjacent
    inode buffer readahead IOs and so we get fewer, larger IO requests
    hitting the storage subsystem for bulkstat.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

16 Nov, 2012

3 commits

  • To separate the verifiers from iodone functions and associate read
    and write verifiers at the same time, introduce a buffer verifier
    operations structure to the xfs_buf.

    This avoids the need for assigning the write verifier, clearing the
    iodone function and re-running ioend processing in the read
    verifier, and gets rid of the nasty "b_pre_io" name for the write
    verifier function pointer. If we ever need to, it will also be
    easier to add further content specific callbacks to a buffer with an
    ops structure in place.

    We also avoid needing to export verifier functions, instead we
    can simply export the ops structures for those that are needed
    outside the function they are defined in.

    This patch also fixes a directory block readahead verifier issue
    it exposed.

    This patch also adds ops callbacks to the inode/alloc btree blocks
    initialised by growfs. These will need more work before they will
    work with CRCs.

    Signed-off-by: Dave Chinner
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • These verifiers are essentially the same code as the read verifiers,
    but do not require ioend processing. Hence factor the read verifier
    functions and add a new write verifier wrapper that is used as the
    callback.

    This is done as one large patch for all verifiers rather than one
    patch per verifier as the change is largely mechanical. This
    includes hooking up the write verifier via the read verifier
    function.

    Hooking up the write verifier for buffers obtained via
    xfs_trans_get_buf() will be done in a separate patch as that touches
    code in many different places rather than just the verifier
    functions.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Add an btree block verify callback function and pass it into the
    buffer read functions. Because each different btree block type
    requires different verification, add a function to the ops structure
    that is called from the generic code.

    Also, propagate the verification callback functions through the
    readahead functions, and into the external bmap and bulkstat inode
    readahead code that uses the generic btree buffer read functions.

    Signed-off-by: Dave Chinner
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     

18 Oct, 2012

1 commit

  • The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c
    along with the other inode cache functions. This removes all functionality from
    xfs_iget.c, so the file can simply be removed.

    This move results in various functions now only having the scope of a single
    file (e.g. xfs_inode_free()), so clean up all the definitions and exported
    prototypes in xfs_icache.[ch] and xfs_inode.h appropriately.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

22 Jul, 2012

1 commit

  • All callers of xfs_imap_to_bp want the dinode pointer, so let's calculate it
    inside xfs_imap_to_bp. Once that is done xfs_itobp becomes a fairly pointless
    wrapper which can be replaced with direct calls to xfs_imap_to_bp.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

15 May, 2012

1 commit

  • With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

27 Mar, 2012

1 commit

  • When we read inodes via bulkstat, we generally only read them once
    and then throw them away - they never get used again. If we retain
    them in cache, then it simply causes the working set of inodes and
    other cached items to be reclaimed just so the inode cache can grow.

    Avoid this problem by marking inodes read by bulkstat not to be
    cached and check this flag in .drop_inode to determine whether the
    inode should be added to the VFS LRU or not. If the inode lookup
    hits an already cached inode, then don't set the flag. If the inode
    lookup hits an inode marked with no cache flag, remove the flag and
    allow it to be cached once the current reference goes away.

    Inodes marked as not cached will get cleaned up by the background
    inode reclaim or via memory pressure, so they will still generate
    some short term cache pressure. They will, however, be reclaimed
    much sooner and in preference to cache hot inodes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

14 Mar, 2012

1 commit

  • Timestamps on regular files are the last metadata that XFS does not update
    transactionally. Now that we use the delaylog mode exclusively and made
    the log scode scale extremly well there is no need to bypass that code for
    timestamp updates. Logging all updates allows to drop a lot of code, and
    will allow for further performance improvements later on.

    Note that this patch drops optimized handling of fdatasync - it will be
    added back in a separate commit.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

08 Apr, 2011

1 commit


19 Oct, 2010

1 commit

  • This patch adds support for 32bit project quota identifiers.

    On disk format is backward compatible with 16bit projid numbers. projid
    on disk is now kept in two 16bit values - di_projid_lo (which holds the
    same position as old 16bit projid value) and new di_projid_hi (takes
    existing padding) and converts from/to 32bit value on the fly.

    xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used
    to enable PROJID32BIT support.

    Signed-off-by: Arkadiusz Miśkiewicz
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Arkadiusz Mi?kiewicz
     

27 Jul, 2010

1 commit

  • xfs_iput is just a small wrapper for xfs_iunlock + IRELE. Having this
    out of line wrapper means the trace events in those two can't track
    their caller properly. So just remove the wrapper and opencode the
    unlock + rele in the few callers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig