12 Oct, 2011

2 commits

  • There is no reason to keep a reference to the inode even if we unlock
    it during transaction commit because we never drop a reference between
    the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
    back into xfs_trans_ijoin - the third argument decides if an unlock
    is needed now.

    I'm actually starting to wonder if allowing inodes to be unlocked
    at transaction commit really is worth the effort. The only real
    benefit is that they can be unlocked earlier when commiting a
    synchronous transactions, but that could be solved by doing the
    log force manually after the unlock, too.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     
  • XFS_TRANS_SWAPEXT is a transaction type, not a flag for xfs_trans_commit, so
    don't pass it in xfs_swap_extents.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

29 Apr, 2011

1 commit

  • follow these guidelines:
    - leave initialization in the declaration block if it fits the line
    - move to the code where it's more suitable ('for' init block)

    The last chunk was modified from David's original to be a correct
    fix for what appeared to be a duplicate initialization.

    Signed-off-by: David Sterba
    Signed-off-by: Alex Elder
    Reviewed-by: Dave Chinner

    David Sterba
     

07 Mar, 2011

1 commit


01 Dec, 2010

1 commit

  • There is an assumption in the parts of XFS that flushing a dirty
    file will make all the delayed allocation blocks disappear from an
    inode. That is, that after calling xfs_flush_pages() then
    ip->i_delayed_blks will be zero.

    This is an invalid assumption as we may have specualtive
    preallocation beyond EOF and they are recorded in
    ip->i_delayed_blks. A flush of the dirty pages of an inode will not
    change the state of these blocks beyond EOF, so a non-zero
    deeelalloc block count after a flush is valid.

    The bmap code has an invalid ASSERT() that needs to be removed, and
    the swapext code has a bug in that while it swaps the data forks
    around, it fails to swap the i_delayed_blks counter associated with
    the fork and hence can get the block accounting wrong.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

27 Jul, 2010

3 commits

  • Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
    joining it to a transaction via xfs_trans_ijoin.

    This patches instead makes xfs_trans_ijoin usable on it's own by doing
    an implicity xfs_trans_ihold, which also allows us to drop the third
    argument. For the case where we want to hold a reference on the inode
    a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
    the inode for needing an xfs_iput. In addition to the cleaner interface
    to the caller this also simplifies the implementation.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     
  • Dmapi support was never merged upstream, but we still have a lot of hooks
    bloating XFS for it, all over the fast pathes of the filesystem.

    This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
    support in mainline at least the namespace events can be done much saner
    in the VFS instead of the individual filesystem, so it's not like this
    is much help for future work.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Dave Chinner

    Christoph Hellwig
     

24 Jun, 2010

1 commit

  • This patch prevents user "foo" from using the SWAPEXT ioctl to swap
    a write-only file owned by user "bar" into a file owned by "foo" and
    subsequently reading it. It does so by checking that the file
    descriptors passed to the ioctl are also opened for reading.

    Signed-off-by: Dan Rosenberg
    Reviewed-by: Christoph Hellwig

    Dan Rosenberg
     

27 Apr, 2010

1 commit

  • A new xfsqa test (226) with a prototype xfs_fsr change to try to
    handle dynamic fork offsets better triggers an assertion failure
    where the inode data fork is in btree format, yet there is room in
    the inode for it to be in extent format. The two inodes look like:

    before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
    before: ino 0x115 (temp), num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
    after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
    after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56

    Basically the target inode ends up with 5 extents in btree format,
    but it had space for 6 extents in extent format, so ends up
    incorrect. Notably here the broot size is the same, and that is
    where the kernel code is going wrong - the btree root will fit, so
    it lets the swap go ahead.

    The check should not allow the swap to take place if the number of
    extents while in btree format is less than the number of extents
    that can fit in the inode in extent format. Adding that check will
    prevent this swap and corruption from occurring.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig

    Dave Chinner
     

16 Jan, 2010

3 commits

  • The swap extent ioctl passes in a target inode and a temporary inode
    which are clearly named in the ioctl structure. The code then
    assigns temp to target and vice versa, making it extremely difficult
    to work out which inode is which later in the code. Make this
    consistent throughout the code.

    Also make xfs_swap_extent static as there are no external users of
    the function.

    Signed-off-by: Dave Chinner
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • To be able to diagnose whether the swap extents function is
    detecting compatible inode data fork configurations for swapping
    extents, add tracing points to the code to allow us to see the
    format of the inode forks before and after the swap.

    Signed-off-by: Dave Chinner
    Signed-off-by: Alex Elder

    Dave Chinner
     
  • When swapping extents, we can corrupt inodes by swapping data forks
    that are in incompatible formats. This is caused by the two indoes
    having different fork offsets due to the presence of an attribute
    fork on an attr2 filesystem. xfs_fsr tries to be smart about
    setting the fork offset, but the trick it plays only works on attr1
    (old fixed format attribute fork) filesystems.

    Changing the way xfs_fsr sets up the attribute fork will prevent
    this situation from ever occurring, so in the kernel code we can get
    by with a preventative fix - check that the data fork in the
    defragmented inode is in a format valid for the inode it is being
    swapped into. This will lead to files that will silently and
    potentially repeatedly fail defragmentation, so issue a warning to
    the log when this particular failure occurs to let us know that
    xfs_fsr needs updating/fixing.

    To help identify how to improve xfs_fsr to avoid this issue, add
    trace points for the inodes being swapped so that we can determine
    why the swap was rejected and to confirm that the code is making the
    right decisions and modifications when swapping forks.

    A further complication is even when the swap is allowed to proceed
    when the fork offset is different between the two inodes then value
    for the maximum number of extents the data fork can hold can be
    wrong. Make sure these are also set correctly after the swap occurs.

    Signed-off-by: Dave Chinner
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Dave Chinner
     

15 Dec, 2009

1 commit

  • Convert the old xfs tracing support that could only be used with the
    out of tree kdb and xfsidbg patches to use the generic event tracer.

    To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
    all xfs trace channels by:

    echo 1 > /sys/kernel/debug/tracing/events/xfs/enable

    or alternatively enable single events by just doing the same in one
    event subdirectory, e.g.

    echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable

    or set more complex filters, etc. In Documentation/trace/events.txt
    all this is desctribed in more detail. To reads the events do a

    cat /sys/kernel/debug/tracing/trace

    Compared to the last posting this patch converts the tracing mostly to
    the one tracepoint per callsite model that other users of the new
    tracing facility also employ. This allows a very fine-grained control
    of the tracing, a cleaner output of the traces and also enables the
    perf tool to use each tracepoint as a virtual performance counter,
    allowing us to e.g. count how often certain workloads git various
    spots in XFS. Take a look at

    http://lwn.net/Articles/346470/

    for some examples.

    Also the btree tracing isn't included at all yet, as it will require
    additional core tracing features not in mainline yet, I plan to
    deliver it later.

    And the really nice thing about this patch is that it actually removes
    many lines of code while adding this nice functionality:

    fs/xfs/Makefile | 8
    fs/xfs/linux-2.6/xfs_acl.c | 1
    fs/xfs/linux-2.6/xfs_aops.c | 52 -
    fs/xfs/linux-2.6/xfs_aops.h | 2
    fs/xfs/linux-2.6/xfs_buf.c | 117 +--
    fs/xfs/linux-2.6/xfs_buf.h | 33
    fs/xfs/linux-2.6/xfs_fs_subr.c | 3
    fs/xfs/linux-2.6/xfs_ioctl.c | 1
    fs/xfs/linux-2.6/xfs_ioctl32.c | 1
    fs/xfs/linux-2.6/xfs_iops.c | 1
    fs/xfs/linux-2.6/xfs_linux.h | 1
    fs/xfs/linux-2.6/xfs_lrw.c | 87 --
    fs/xfs/linux-2.6/xfs_lrw.h | 45 -
    fs/xfs/linux-2.6/xfs_super.c | 104 ---
    fs/xfs/linux-2.6/xfs_super.h | 7
    fs/xfs/linux-2.6/xfs_sync.c | 1
    fs/xfs/linux-2.6/xfs_trace.c | 75 ++
    fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++
    fs/xfs/linux-2.6/xfs_vnode.h | 4
    fs/xfs/quota/xfs_dquot.c | 110 ---
    fs/xfs/quota/xfs_dquot.h | 21
    fs/xfs/quota/xfs_qm.c | 40 -
    fs/xfs/quota/xfs_qm_syscalls.c | 4
    fs/xfs/support/ktrace.c | 323 ---------
    fs/xfs/support/ktrace.h | 85 --
    fs/xfs/xfs.h | 16
    fs/xfs/xfs_ag.h | 14
    fs/xfs/xfs_alloc.c | 230 +-----
    fs/xfs/xfs_alloc.h | 27
    fs/xfs/xfs_alloc_btree.c | 1
    fs/xfs/xfs_attr.c | 107 ---
    fs/xfs/xfs_attr.h | 10
    fs/xfs/xfs_attr_leaf.c | 14
    fs/xfs/xfs_attr_sf.h | 40 -
    fs/xfs/xfs_bmap.c | 507 +++------------
    fs/xfs/xfs_bmap.h | 49 -
    fs/xfs/xfs_bmap_btree.c | 6
    fs/xfs/xfs_btree.c | 5
    fs/xfs/xfs_btree_trace.h | 17
    fs/xfs/xfs_buf_item.c | 87 --
    fs/xfs/xfs_buf_item.h | 20
    fs/xfs/xfs_da_btree.c | 3
    fs/xfs/xfs_da_btree.h | 7
    fs/xfs/xfs_dfrag.c | 2
    fs/xfs/xfs_dir2.c | 8
    fs/xfs/xfs_dir2_block.c | 20
    fs/xfs/xfs_dir2_leaf.c | 21
    fs/xfs/xfs_dir2_node.c | 27
    fs/xfs/xfs_dir2_sf.c | 26
    fs/xfs/xfs_dir2_trace.c | 216 ------
    fs/xfs/xfs_dir2_trace.h | 72 --
    fs/xfs/xfs_filestream.c | 8
    fs/xfs/xfs_fsops.c | 2
    fs/xfs/xfs_iget.c | 111 ---
    fs/xfs/xfs_inode.c | 67 --
    fs/xfs/xfs_inode.h | 76 --
    fs/xfs/xfs_inode_item.c | 5
    fs/xfs/xfs_iomap.c | 85 --
    fs/xfs/xfs_iomap.h | 8
    fs/xfs/xfs_log.c | 181 +----
    fs/xfs/xfs_log_priv.h | 20
    fs/xfs/xfs_log_recover.c | 1
    fs/xfs/xfs_mount.c | 2
    fs/xfs/xfs_quota.h | 8
    fs/xfs/xfs_rename.c | 1
    fs/xfs/xfs_rtalloc.c | 1
    fs/xfs/xfs_rw.c | 3
    fs/xfs/xfs_trans.h | 47 +
    fs/xfs/xfs_trans_buf.c | 62 -
    fs/xfs/xfs_vnodeops.c | 8
    70 files changed, 2151 insertions(+), 2592 deletions(-)

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

09 Oct, 2009

1 commit

  • This is picking up on Felix's repost of Dave's patch to implement a
    .dirty_inode method. We really need this notification because
    the VFS keeps writing directly into the inode structure instead
    of going through methods to update this state. In addition to
    the long-known atime issue we now also have a caller in VM code
    that updates c/mtime that way for shared writeable mmaps. And
    I found another one that no one has noticed in practice in the FIFO
    code.

    So implement ->dirty_inode to set i_update_core whenever the
    inode gets externally dirtied, and switch the c/mtime handling to
    the same scheme we already use for atime (always picking up
    the value from the Linux inode).

    Note that this patch also removes the xfs_synchronize_atime call
    in xfs_reclaim it was superflous as we already synchronize the time
    when writing the inode via the log (xfs_inode_item_format) or the
    normal buffers (xfs_iflush_int).

    In addition also remove the I_CLEAR check before copying the Linux
    timestamps - now that we always have the Linux inode available
    we can always use the timestamps in it.

    Also switch to just using file_update_time for regular reads/writes -
    that will get us all optimization done to it for free and make
    sure we notice early when it breaks.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Felix Blyakher
    Reviewed-by: Alex Elder
    Signed-off-by: Alex Elder

    Christoph Hellwig
     

02 Jun, 2009

1 commit

  • Regreesion from commit ef8f7fc, which rearranged the code in
    xfs_swap_extents() leading to double unlock of xfs inode ilock.
    That resulted in xfs_fsr deadlocking itself on platforms, which
    don't handle double unlock of rw_semaphore nicely. It caused the
    count go negative, which represents the write holder, without
    really having one. ia64 is one of the platforms where deadlock
    was easily reproduced and the fix was tested.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Eric Sandeen
    Signed-off-by: Felix Blyakher

    Felix Blyakher
     

13 Feb, 2009

1 commit


04 Feb, 2009

1 commit


28 Jan, 2009

1 commit

  • fixes kernel.org bugzilla 12538, xfs_fsr fails on 2.6.29-rc kernels

    Regression caused by 743bb4650da9e2595d6cedd01c680b5b9398c74a

    This was an embarrasing mistake, reallocating the sxp pointer passed
    in from the main ioctl switch.

    Signed-off-by: Eric Sandeen
    Tested-by: Paul Martin
    Reviewed-by: Felix Blyakher
    Signed-off-by: Felix Blyakher

    Eric Sandeen
     

02 Dec, 2008

1 commit

  • Moving the copy_from_user out of some of the ioctl helpers will
    make it easier for the compat ioctl switch to copy in the right
    struct, then just pass to the underlying helper.

    Also, move common access checks into the helpers themselves,
    and out of the native ioctl switch code, to reduce code
    duplication between native & compat ioctl callers.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    sandeen@sandeen.net
     

17 Sep, 2008

1 commit

  • If we call xfs_lock_two_inodes() to grab both the iolock and the ilock,
    then drop the ilocks on both inodes, then grab them again (as
    xfs_swap_extents() does) then lockdep will report a locking order problem.
    This is a false positive.

    To avoid this, disallow xfs_lock_two_inodes() fom locking both inode locks
    at once - force calers to make two separate calls. This means that nested
    dropping and regaining of the ilocks will retain the same lockdep subclass
    and so lockdep will not see anything wrong with this code.

    SGI-PV: 986238

    SGI-Modid: xfs-linux-melb:xfs-kern:31999a

    Signed-off-by: David Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Peter Leckie
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

13 Aug, 2008

4 commits

  • In various places we can just move a VFS_I call into the argument list of
    called functions/macros instead of having a local bhv_vnode_t.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31776a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • When multiple inodes are locked in XFS it happens in order of the inode
    number, with the everything but the first inode trylocked if any of the
    previous inodes is in the AIL.

    Except for the sorting of the inodes this logic is implemented in
    xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry
    in a particularly stupid way adds a lock roundtrip if the inode ordering
    is not optimal.

    This patch adds a new helper xfs_lock_two_inodes that takes two inodes and
    locks them in the most optimal way according to the above locking protocol
    and uses it for all places that want to lock two inodes.

    The only caller of xfs_lock_inodes is xfs_rename which might lock up to
    four inodes.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31772a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Donald Douwsma
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Use IHOLD(ip) instead of VN_HOLD(VFS_I(ip)).

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31765a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Replace XFS_ITOV() with the new VFS_I() inline.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31724a

    Signed-off-by: David Chinner
    Signed-off-by: Niv Sardi
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

28 Jul, 2008

1 commit

  • kmem_free() function takes (ptr, size) arguments but doesn't actually use
    second one.

    This patch removes size argument from all callsites.

    SGI-PV: 981498
    SGI-Modid: xfs-linux-melb:xfs-kern:31050a

    Signed-off-by: Denys Vlasenko
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Denys Vlasenko
     

29 Apr, 2008

1 commit

  • ->rename already gets the target inode passed if it exits. Pass it down to
    xfs_rename so that we can avoid looking it up again. Also simplify locking
    as the first lock section in xfs_rename can go away now: the isdir is an
    invariant over the lifetime of the inode, and new_parent and the nlink
    check are namespace topology protected by i_mutex in the VFS. The projid
    check needs to move into the second lock section anyway to not be racy.

    Also kill the now unused xfs_dir_lookup_int and remove the now-unused
    first_locked argumet to xfs_lock_inodes.

    SGI-PV: 976035
    SGI-Modid: xfs-linux-melb:xfs-kern:30903a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     

07 Feb, 2008

5 commits

  • xfs_swapext should simplify check if we have a writeable file descriptor
    instead of re-checking the permissions using xfs_iaccess. Add an
    additional check to refuse O_APPEND file descriptors because swapext is
    not an append-only write operation.

    SGI-PV: 971186
    SGI-Modid: xfs-linux-melb:xfs-kern:30369a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • - stop using vnodes
    - use proper multiple label goto unwinding
    - give the struct file * variables saner names

    SGI-PV: 971186
    SGI-Modid: xfs-linux-melb:xfs-kern:30366a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Use XFS_IS_REALTIME_INODE in more places, and #define it to 0 if
    CONFIG_XFS_RT is off. This should be safe because mount checks in
    xfs_rtmount_init:

    so if we get mounted w/o CONFIG_XFS_RT, no realtime inodes should be
    encountered after that.

    Defining XFS_IS_REALTIME_INODE to 0 saves a bit of stack space,
    presumeably gcc can optimize around the various "if (0)" type checks:

    xfs_alloc_file_space -8 xfs_bmap_adjacent -16 xfs_bmapi -8
    xfs_bmap_rtalloc -16 xfs_bunmapi -28 xfs_free_file_space -64 xfs_imap +8

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Eric Sandeen
     
  • xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it
    just duplicates fields already in xfs_inode, and there is nothing this
    abstraction buys us on XFS/Linux. This patch removes it and shrinks source
    and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44
    bytes in debug/non-debug builds.

    SGI-PV: 970852
    SGI-Modid: xfs-linux-melb:xfs-kern:29754a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Tim Shimmin

    Christoph Hellwig
     
  • Currently there is an indirection called ioops in the XFS data I/O path.
    Various functions are called by functions pointers, but there is no
    coherence in what this is for, and of course for XFS itself it's entirely
    unused. This patch removes it instead and significantly reduces source and
    binary size of XFS while making maintaince easier.

    SGI-PV: 970841
    SGI-Modid: xfs-linux-melb:xfs-kern:29737a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin

    Lachlan McIlroy
     

16 Oct, 2007

1 commit


08 May, 2007

2 commits


10 Feb, 2007

1 commit

  • xfs_mac.h and xfs_cap.h provide definitions and macros that aren't used
    anywhere in XFS at all. They are left-overs from "to be implement at some
    point in the future" functionality that Irix XFS has. If this
    functionality ever goes into Linux, it will be provided at a different
    layer, most likely through the security hooks in the kernel so we will
    never need this functionality in XFS.

    Patch provided by Eric Sandeen (sandeen@sandeen.net).

    SGI-PV: 960895
    SGI-Modid: xfs-linux-melb:xfs-kern:28036a

    Signed-off-by: Eric Sandeen
    Signed-off-by: David Chinner
    Signed-off-by: Tim Shimmin

    Eric Sandeen
     

09 Dec, 2006

1 commit


20 Jun, 2006

1 commit


09 Jun, 2006

2 commits