16 Nov, 2012

3 commits

  • To separate the verifiers from iodone functions and associate read
    and write verifiers at the same time, introduce a buffer verifier
    operations structure to the xfs_buf.

    This avoids the need for assigning the write verifier, clearing the
    iodone function and re-running ioend processing in the read
    verifier, and gets rid of the nasty "b_pre_io" name for the write
    verifier function pointer. If we ever need to, it will also be
    easier to add further content specific callbacks to a buffer with an
    ops structure in place.

    We also avoid needing to export verifier functions, instead we
    can simply export the ops structures for those that are needed
    outside the function they are defined in.

    This patch also fixes a directory block readahead verifier issue
    it exposed.

    This patch also adds ops callbacks to the inode/alloc btree blocks
    initialised by growfs. These will need more work before they will
    work with CRCs.

    Signed-off-by: Dave Chinner
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • These verifiers are essentially the same code as the read verifiers,
    but do not require ioend processing. Hence factor the read verifier
    functions and add a new write verifier wrapper that is used as the
    callback.

    This is done as one large patch for all verifiers rather than one
    patch per verifier as the change is largely mechanical. This
    includes hooking up the write verifier via the read verifier
    function.

    Hooking up the write verifier for buffers obtained via
    xfs_trans_get_buf() will be done in a separate patch as that touches
    code in many different places rather than just the verifier
    functions.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • Add an btree block verify callback function and pass it into the
    buffer read functions. Because each different btree block type
    requires different verification, add a function to the ops structure
    that is called from the generic code.

    Also, propagate the verification callback functions through the
    readahead functions, and into the external bmap and bulkstat inode
    readahead code that uses the generic btree buffer read functions.

    Signed-off-by: Dave Chinner
    Reviewed-by: Phil White
    Signed-off-by: Ben Myers

    Dave Chinner
     

18 Oct, 2012

1 commit

  • The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c
    along with the other inode cache functions. This removes all functionality from
    xfs_iget.c, so the file can simply be removed.

    This move results in various functions now only having the scope of a single
    file (e.g. xfs_inode_free()), so clean up all the definitions and exported
    prototypes in xfs_icache.[ch] and xfs_inode.h appropriately.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

22 Jul, 2012

1 commit

  • All callers of xfs_imap_to_bp want the dinode pointer, so let's calculate it
    inside xfs_imap_to_bp. Once that is done xfs_itobp becomes a fairly pointless
    wrapper which can be replaced with direct calls to xfs_imap_to_bp.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

15 May, 2012

1 commit

  • With the removal of xfs_rw.h and other changes over time, xfs_bit.h
    is being included in many files that don't actually need it. Clean
    up the includes as necessary.

    Also move the only-used-once xfs_ialloc_find_free() static inline
    function out of a header file that is widely included to reduce
    the number of needless dependencies on xfs_bit.h.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Dave Chinner
     

27 Mar, 2012

1 commit

  • When we read inodes via bulkstat, we generally only read them once
    and then throw them away - they never get used again. If we retain
    them in cache, then it simply causes the working set of inodes and
    other cached items to be reclaimed just so the inode cache can grow.

    Avoid this problem by marking inodes read by bulkstat not to be
    cached and check this flag in .drop_inode to determine whether the
    inode should be added to the VFS LRU or not. If the inode lookup
    hits an already cached inode, then don't set the flag. If the inode
    lookup hits an inode marked with no cache flag, remove the flag and
    allow it to be cached once the current reference goes away.

    Inodes marked as not cached will get cleaned up by the background
    inode reclaim or via memory pressure, so they will still generate
    some short term cache pressure. They will, however, be reclaimed
    much sooner and in preference to cache hot inodes.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ben Myers

    Dave Chinner
     

14 Mar, 2012

1 commit

  • Timestamps on regular files are the last metadata that XFS does not update
    transactionally. Now that we use the delaylog mode exclusively and made
    the log scode scale extremly well there is no need to bypass that code for
    timestamp updates. Logging all updates allows to drop a lot of code, and
    will allow for further performance improvements later on.

    Note that this patch drops optimized handling of fdatasync - it will be
    added back in a separate commit.

    Reviewed-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    Christoph Hellwig
     

08 Apr, 2011

1 commit


19 Oct, 2010

1 commit

  • This patch adds support for 32bit project quota identifiers.

    On disk format is backward compatible with 16bit projid numbers. projid
    on disk is now kept in two 16bit values - di_projid_lo (which holds the
    same position as old 16bit projid value) and new di_projid_hi (takes
    existing padding) and converts from/to 32bit value on the fly.

    xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used
    to enable PROJID32BIT support.

    Signed-off-by: Arkadiusz Miśkiewicz
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Arkadiusz Mi?kiewicz
     

27 Jul, 2010

3 commits

  • xfs_iput is just a small wrapper for xfs_iunlock + IRELE. Having this
    out of line wrapper means the trace events in those two can't track
    their caller properly. So just remove the wrapper and opencode the
    unlock + rele in the few callers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Dmapi support was never merged upstream, but we still have a lot of hooks
    bloating XFS for it, all over the fast pathes of the filesystem.

    This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
    support in mainline at least the namespace events can be done much saner
    in the VFS instead of the individual filesystem, so it's not like this
    is much help for future work.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

24 Jun, 2010

2 commits

  • The block number comes from bulkstat based inode lookups to shortcut
    the mapping calculations. We ar enot able to trust anything from
    bulkstat, so drop the block number as well so that the correct
    lookups and mappings are always done.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     
  • Inode numbers may come from somewhere external to the filesystem
    (e.g. file handles, bulkstat information) and so are inherently
    untrusted. Rename the flag we use for these lookups to make it
    obvious we are doing a lookup of an untrusted inode number and need
    to verify it completely before trying to read it from disk.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

23 Jun, 2010

1 commit

  • The non-coherent bulkstat versionsthat look directly at the inode
    buffers causes various problems with performance optimizations that
    make increased use of just logging inodes. This patch makes bulkstat
    always use iget, which should be fast enough for normal use with the
    radix-tree based inode cache introduced a while ago.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

06 Mar, 2010

1 commit

  • So that fsr can attempt to get the fork offset of the temporary
    inode it uses the same as the inode it is defragmenting, pass the
    fork offset out in the bulkstat information.

    The bulkstat structure has padding that has always been zeroed, so
    userspace can tell if this field is set or not by use of the xattr
    present flag and a non-zero value for the fork offset.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

22 Jan, 2010

1 commit


16 Jan, 2010

1 commit

  • The use of an array for the per-ag structures requires reallocation
    of the array when growing the filesystem. This requires locking
    access to the array to avoid use after free situations, and the
    locking is difficult to get right. To avoid needing to reallocate an
    array, change the per-ag structures to an allocated object per ag
    and index them using a tree structure.

    The AGs are always densely indexed (hence the use of an array), but
    the number supported is 2^32 and lookups tend to be random and hence
    indexing needs to scale. A simple choice is a radix tree - it works
    well with this sort of index. This change also removes another
    large contiguous allocation from the mount/growfs path in XFS.

    The growing process now needs to change to only initialise the new
    AGs required for the extra space, and as such only needs to
    exclusively lock the tree for inserts. The rest of the code only
    needs to lock the tree while doing lookups, and hence this will
    remove all the deadlocks that currently occur on the m_perag_lock as
    it is now an innermost lock. The lock is also changed to a spinlock
    from a read/write lock as the hold time is now extremely short.

    To complete the picture, the per-ag structures will need to be
    reference counted to ensure that we don't free/modify them while
    they are still in use. This will be done in subsequent patch.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

09 Oct, 2009

1 commit

  • This is picking up on Felix's repost of Dave's patch to implement a
    .dirty_inode method. We really need this notification because
    the VFS keeps writing directly into the inode structure instead
    of going through methods to update this state. In addition to
    the long-known atime issue we now also have a caller in VM code
    that updates c/mtime that way for shared writeable mmaps. And
    I found another one that no one has noticed in practice in the FIFO
    code.

    So implement ->dirty_inode to set i_update_core whenever the
    inode gets externally dirtied, and switch the c/mtime handling to
    the same scheme we already use for atime (always picking up
    the value from the Linux inode).

    Note that this patch also removes the xfs_synchronize_atime call
    in xfs_reclaim it was superflous as we already synchronize the time
    when writing the inode via the log (xfs_inode_item_format) or the
    normal buffers (xfs_iflush_int).

    In addition also remove the I_CLEAR check before copying the Linux
    timestamps - now that we always have the Linux inode available
    we can always use the timestamps in it.

    Also switch to just using file_update_time for regular reads/writes -
    that will get us all optimization done to it for free and make
    sure we notice early when it breaks.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Felix Blyakher
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

02 Sep, 2009

2 commits

  • Currenly we have a xfs_inobt_lookup* variant for each comparism direction,
    and all these get all three fields of the inobt records passed, while the
    common case is just looking for the inode number and we have only marginally
    more callers than xfs_inobt_lookup* variants.

    So opencode a direct call to xfs_btree_lookup for the single case where we
    need all fields, and replace xfs_inobt_lookup* with a xfs_inobt_looku that
    just takes the inode number and the direction for all other callers.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     
  • Most callers of xfs_inobt_get_rec need to fill a xfs_inobt_rec_incore_t, and
    those who don't yet are fine with a xfs_inobt_rec_incore_t, instead of the
    three individual variables, too. So just change xfs_inobt_get_rec to write
    the output into a xfs_inobt_rec_incore_t directly.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Alex Elder
    Signed-off-by: Felix Blyakher

    Christoph Hellwig
     

01 Sep, 2009

1 commit


29 Mar, 2009

1 commit


16 Mar, 2009

1 commit


16 Jan, 2009

1 commit


02 Dec, 2008

2 commits

  • The 32-bit xfs_blkstat_one handler was failing because
    a size check checked whether the remaining (32-bit)
    user buffer was less than the (64-bit) bulkstat buffer,
    and failed with ENOMEM if so. Move this check
    into the respective handlers so that they check the
    correct sizes.

    Also, the formatters were returning negative errors
    or positive bytes copied; this was odd in the positive
    error value world of xfs, and handled wrong by at least
    some of the callers, which treated the bytes returned
    as an error value. Move the bytes-used assignment
    into the formatters.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    sandeen@sandeen.net
     
  • Currently the compat formatter was handled by passing
    in "private_data" for the xfs_bulkstat_one formatter,
    which was really just another formatter... IMHO this
    got confusing.

    Instead, just make a new xfs_bulkstat_one_compat
    formatter for xfs_bulkstat, and call it via a wrapper.

    Also, don't translate the ioctl nrs into their native
    counterparts, that just clouds the issue; we're in a
    compat handler anyway, just switch on the 32-bit cmds.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    sandeen@sandeen.net
     

01 Dec, 2008

4 commits

  • Just pass down the XFS_IGET_* flags all the way down to xfs_imap instead
    of translating them mid-way.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • Most uses of struct xfs_imap are to map and inode to a buffer. To avoid
    copying around the inode location information we should just embedd a
    strcut xfs_imap into the xfs_inode. To make sure it doesn't bloat an
    inode the im_len is changed to a ushort, which is fine as that's what
    the users exepect anyway.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • These names don't add any value at all over just using the numerical
    values.

    (First sent on October 9th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     
  • Now that we have a separate xfs_icdinode_t for the in-core inode which
    gets logged there is no need anymore for the xfs_dinode vs xfs_dinode_core
    split - the fact that part of the structure gets logged through the inode
    log item and a small part not can better be described in a comment.

    All sizeof operations on the dinode_core either really wanted the
    icdinode and are switched to that one, or had already added the size
    of the agi unlinked list pointer. Later both will be replaced with
    helpers once we get the larger CRC-enabled dinode.

    Removing the data and attribute fork unions also has the advantage that
    xfs_dinode.h doesn't need to pull in every header under the sun.

    While we're at it also add some more comments describing the dinode
    structure.

    (First sent on October 7th)

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner
    Signed-off-by: Niv Sardi

    Christoph Hellwig
     

30 Oct, 2008

4 commits

  • xfs_bulkstat only wants the dinode, offset and buffer from a given inode
    number. Instead of using xfs_itobp on a fake inode which is complicated
    and currently leads to leaks of the security data just use xfs_inotobp
    which is designed to do exactly the kind of lookup xfs_bulkstat wants. The
    only thing that's missing in xfs_inotobp is a flags paramter that let's us
    pass down XFS_IMAP_BULKSTAT, but that can easily added.

    SGI-PV: 987246

    SGI-Modid: xfs-linux-melb:xfs-kern:32397a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • From: Dave Chinner

    Because this is the first major generic btree routine this patch includes
    some infrastrucure, first a few routines to deal with a btree block that
    can be either in short or long form, second xfs_btree_read_buf_block,
    which is the new central routine to read a btree block given a cursor, and
    third the new xfs_btree_ptr_addr routine to calculate the address for a
    given btree pointer record.

    [hch: split out from bigger patch and minor adaptions]

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32190a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • xfs_btree_init_cursor contains close to little shared code for the
    different btrees and will get even more non-common code in the future.
    Split it up into one routine per btree type.

    Because xfs_btree_dup_cursor needs to call the init routine for a generic
    btree cursor add a new btree operation vector that contains a dup_cursor
    method that initializes a new cursor based on an existing one.

    The btree operations vector is based on an idea and code from Dave Chinner
    and will grow more entries later during this series.

    SGI-PV: 985583

    SGI-Modid: xfs-linux-melb:xfs-kern:32176a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Bill O'Donnell
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • To avoid having to initialise some fields of the XFS inode on every
    allocation, we can use the slab init-once feature to initialise them. All
    we have to guarantee is that when we free the inode, all it's entries are
    in the initial state. Add asserts where possible to ensure debug kernels
    check this initial state before freeing and after allocation.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31925a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig

    David Chinner
     

13 Aug, 2008

2 commits


28 Jul, 2008

1 commit

  • kmem_free() function takes (ptr, size) arguments but doesn't actually use
    second one.

    This patch removes size argument from all callsites.

    SGI-PV: 981498
    SGI-Modid: xfs-linux-melb:xfs-kern:31050a

    Signed-off-by: Denys Vlasenko
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Denys Vlasenko
     

29 Apr, 2008

1 commit

  • Unless XFS_IGET_CREATE is passed xfs_iget will return ENOENT if it
    encounters an inode with di_mode == 0. Remove the duplicated checks in the
    callers.

    (the log recovery case is not touched for now)

    SGI-PV: 976035
    SGI-Modid: xfs-linux-melb:xfs-kern:30898a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig