08 Feb, 2013

1 commit

  • Pull btrfs fixes from Chris Mason:
    "We've got corner cases for updating i_size that ceph was hitting,
    error handling for quotas when we run out of space, a very subtle
    snapshot deletion race, a crash while removing devices, and one
    deadlock between subvolume creation and the sb_internal code (thanks
    lockdep)."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: move d_instantiate outside the transaction during mksubvol
    Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata
    Btrfs: fix possible stale data exposure
    Btrfs: fix missing i_size update
    Btrfs: fix race between snapshot deletion and getting inode
    Btrfs: fix missing release of the space/qgroup reservation in start_transaction()
    Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write()
    Btrfs: do not merge logged extents if we've removed them from the tree
    btrfs: don't try to notify udev about missing devices

    Linus Torvalds
     

07 Feb, 2013

1 commit

  • Dave Sterba triggered a lockdep complaint about lock ordering
    between the sb_internal lock and the cleaner semaphore.

    btrfs_lookup_dentry() checks for orphans if we're looking up
    the inode for a subvolume, and subvolume creation is triggering
    the lookup with a transaction running.

    This commit moves the d_instantiate after the transaction closes.

    Signed-off-by: Chris Mason

    Chris Mason
     

06 Feb, 2013

8 commits

  • When btrfs_qgroup_reserve returned a failure, we were missing a counter
    operation for BTRFS_I(inode)->outstanding_extents++, leading to warning
    messages about outstanding extents and space_info->bytes_may_use != 0.
    Additionally, the error handling code didn't take into account that we
    dropped the inode lock which might require more cleanup.

    Luckily, all the cleanup code we need is already there and can be shared
    with reserve_metadata_bytes, which is exactly what this patch does.

    Reported-by: Lev Vainblat
    Signed-off-by: Jan Schmidt
    Signed-off-by: Chris Mason

    Jan Schmidt
     
  • Chris Mason
     
  • We specifically do not update the disk i_size if there are ordered extents
    outstanding for any area between the current disk_i_size and our ordered
    extent so that we do not expose stale data. The problem is the check we
    have only checks if the ordered extent starts at or after the current
    disk_i_size, which doesn't take into account an ordered extent that starts
    before the current disk_i_size and ends past the disk_i_size. Fix this by
    checking if the extent ends past the disk_i_size. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • If we have an ordered extent before the ordered extent we are currently
    completing that is after the current disk_i_size we will put our i_size
    update into that ordered extent so that we do not expose stale data. The
    problem is that if our disk i_size is updated past the previous ordered
    extent we won't update the i_size with the pending i_size update. So check
    the pending i_size update and if its above the current disk i_size we need
    to go ahead and try to update. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • While running snapshot testscript created by Mitch and David,
    the race between autodefrag and snapshot deletion can lead to
    corruption of dead_root list so that we can get crash on
    btrfs_clean_old_snapshots().

    And besides autodefrag, scrub also does the same thing, ie. read
    root first and get inode.

    Here is the story(take autodefrag as an example):
    (1) when we delete a snapshot or subvolume, it will set its root's
    refs to zero and do a iput() on its own inode, and if this inode happens
    to be the only active in-meory one in root's inode rbtree, it will add
    itself to the global dead_roots list for later cleanup.

    (2) after (1), the autodefrag thread may read another inode for defrag
    and the inode is just in the deleted snapshot/subvolume, but all of these
    are without checking if the root is still valid(refs > 0). So the end up
    result is adding the deleted snapshot/subvolume's root to the global
    dead_roots list AGAIN.

    Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.

    So all we need to do is to take the lock to protect 'read root and get inode',
    since we synchronize to wait for the rcu grace period before adding something
    to the global dead_roots list.

    Reported-by: Mitch Harder
    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik

    Liu Bo
     
  • When we fail to start a transaction, we need to release the reserved free space
    and qgroup space, fix it.

    Signed-off-by: Miao Xie
    Reviewed-by: Jan Schmidt
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • If the checks at the beginning of btrfs_file_aio_write() fail, we needn't
    decrease ->sync_writers, because we have not increased it. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • You can run into this problem where if somebody is fsyncing and writing out
    the existing extents you will have removed the extent map from the em tree,
    but it's still valid for the current fsync so we go ahead and write it. The
    problem is we unconditionally try to merge it back into the em tree, but if
    we've removed it from the em tree that will cause use after free problems.
    Fix this to only merge if we are still a part of the tree. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

05 Feb, 2013

3 commits

  • Pull dlm fix from David Teigland:
    "Thanks to Jana who reported the problem and was able to test this fix
    so quickly."

    This fixes an incorrect size check that triggered for CONFIG_COMPAT
    whether the code was actually doing compat or not. The incorrect write
    size check broke userland (clvmd) when maximum resource name lengths are
    used.

    * 'fix-max-write' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    dlm: check the write size from user

    Linus Torvalds
     
  • There exists a situation when GC can work in background alone without
    any other filesystem activity during significant time.

    The nilfs_clean_segments() method calls nilfs_segctor_construct() that
    updates superblocks in the case of NILFS_SC_SUPER_ROOT and
    THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the
    nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
    As a result, the update of superblocks doesn't occurred all this time
    and in the case of SPOR superblocks keep very old values of last super
    root placement.

    SYMPTOMS:

    Trying to mount a NILFS2 volume after SPOR in such environment ends with
    very long mounting time (it can achieve about several hours in some
    cases).

    REPRODUCING PATH:

    1. It needs to use external USB HDD, disable automount and doesn't
    make any additional filesystem activity on the NILFS2 volume.

    2. Generate temporary file with size about 100 - 500 GB (for example,
    dd if=/dev/zero of= bs=1073741824 count=200). The size of
    file defines duration of GC working.

    3. Then it needs to delete file.

    4. Start GC manually by means of command "nilfs-clean -p 0". When you
    start GC by means of such way then, at the end, superblocks is updated
    by once. So, for simulation of SPOR, it needs to wait sometime (15 -
    40 minutes) and simply switch off USB HDD manually.

    5. Switch on USB HDD again and try to mount NILFS2 volume. As a
    result, NILFS2 volume will mount during very long time.

    REPRODUCIBILITY: 100%

    FIX:

    This patch adds checking that superblocks need to update and set
    THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.

    Reported-by: Sergey Alexandrov
    Signed-off-by: Vyacheslav Dubeyko
    Tested-by: Vyacheslav Dubeyko
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Return EINVAL from write if the size is larger than
    allowed. Do this before allocating kernel memory for
    the bogus size, which could lead to OOM.

    Reported-by: Sasha Levin
    Tested-by: Jana Saout
    Signed-off-by: David Teigland

    David Teigland
     

02 Feb, 2013

1 commit


01 Feb, 2013

1 commit

  • Pull NFS client bugfixes from Trond Myklebust:

    - Error reporting in nfs_xdev_mount incorrectly maps all errors to
    ENOMEM

    - Fix an NFSv4 refcounting issue

    - Fix a mount failure when the server reboots during NFSv4 trunking
    discovery

    - NFSv4.1 mounts may need to run the lease recovery thread.

    - Don't silently fail setattr() requests on mountpoints

    - Fix a SUNRPC socket/transport livelock and priority queue issue

    - We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.

    * tag 'nfs-for-3.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
    SUNRPC: When changing the queue priority, ensure that we change the owner
    NFS: Don't silently fail setattr() requests on mountpoints
    NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
    NFSv4: Fix NFSv4 trunking discovery
    NFSv4: Fix NFSv4 reference counting for trunked sessions
    NFS: Fix error reporting in nfs_xdev_mount

    Linus Torvalds
     

31 Jan, 2013

2 commits

  • NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It
    usually means that the server is busy handling an unfinished RPC
    request. Just sleep for a second and then retry.
    We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return
    value. If the NFS server has outstanding callbacks, we just want to
    similarly sleep & retry.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     
  • Ensure that any setattr and getattr requests for junctions and/or
    mountpoints are sent to the server. Ever since commit
    0ec26fd0698 (vfs: automount should ignore LOOKUP_FOLLOW), we have
    silently dropped any setattr requests to a server-side mountpoint.
    For referrals, we have silently dropped both getattr and setattr
    requests.

    This patch restores the original behaviour for setattr on mountpoints,
    and tries to do the same for referrals, provided that we have a
    filehandle...

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Trond Myklebust
     

30 Jan, 2013

1 commit

  • Pull xfs bugfixes from Ben Myers:
    "Here are fixes for returning EFSCORRUPTED on probe of a non-xfs
    filesystem, the stack switch in xfs_bmapi_allocate, a crash in
    _xfs_buf_find, speculative preallocation as the filesystem nears
    ENOSPC, an unmount hang, a race with AIO, and a regression with
    xfs_fsr:

    - fix return value when filesystem probe finds no XFS magic, a
    regression introduced in 9802182.

    - fix stack switch in __xfs_bmapi_allocate by moving the check for
    stack switch up into xfs_bmapi_write.

    - fix oops in _xfs_buf_find by validating that the requested block is
    within the filesystem bounds.

    - limit speculative preallocation near ENOSPC.

    - fix an unmount hang in xfs_wait_buftarg by freeing the
    xfs_buf_log_item in xfs_buf_item_unlock.

    - fix a possible use after free with AIO.

    - fix xfs_swap_extents after removal of xfs_flushinval_pages, a
    regression introduced in commit fb59581404a."

    * tag 'for-linus-v3.8-rc6' of git://oss.sgi.com/xfs/xfs:
    xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
    xfs: Fix possible use-after-free with AIO
    xfs: fix shutdown hang on invalid inode during create
    xfs: limit speculative prealloc near ENOSPC thresholds
    xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
    xfs: pull up stack_switch check into xfs_bmapi_write
    xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magic

    Linus Torvalds
     

29 Jan, 2013

7 commits

  • Commit fb59581404ab7ec5075299065c22cb211a9262a9 removed
    xfs_flushinval_pages() and changed its callers to use
    filemap_write_and_wait() and truncate_pagecache_range() directly.

    But in xfs_swap_extents() this change accidental switched the argument
    for 'tip' to 'ip'. This patch switches it back to 'tip'

    Signed-off-by: Torsten Kaiser
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Torsten Kaiser
     
  • Running AIO is pinning inode in memory using file reference. Once AIO
    is completed using aio_complete(), file reference is put and inode can
    be freed from memory. So we have to be sure that calling aio_complete()
    is the last thing we do with the inode.

    CC: xfs@oss.sgi.com
    CC: Ben Myers
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Jan Kara
     
  • When the new inode verify in xfs_iread() fails, the create
    transaction is aborted and a shutdown occurs. The subsequent unmount
    then hangs in xfs_wait_buftarg() on a buffer that has an elevated
    hold count. Debug showed that it was an AGI buffer getting stuck:

    [ 22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
    [ 22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
    [ 23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
    [ 23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck

    The trace of this buffer leading up to the shutdown (trimmed for
    brevity) looks like:

    xfs_buf_init: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
    xfs_buf_get: bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
    xfs_buf_read: bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
    xfs_buf_iorequest: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
    xfs_buf_hold: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
    xfs_buf_rele: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
    xfs_buf_iowait: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
    xfs_buf_ioerror: bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
    xfs_buf_iodone: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
    xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
    xfs_buf_hold: bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
    xfs_trans_read_buf: bno 0x2 len 0x200 hold 2 recur 0 refcount 1
    xfs_trans_brelse: bno 0x2 len 0x200 hold 2 recur 0 refcount 1
    xfs_buf_item_relse: bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
    xfs_buf_rele: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
    xfs_buf_unlock: bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
    xfs_buf_rele: bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
    xfs_buf_trylock: bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
    xfs_buf_find: bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
    xfs_buf_get: bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
    xfs_buf_read: bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
    xfs_buf_hold: bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
    xfs_trans_read_buf: bno 0x2 len 0x200 hold 3 recur 0 refcount 1
    xfs_trans_log_buf: bno 0x2 len 0x200 hold 3 recur 0 refcount 1
    xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
    xfs_buf_unlock: bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
    xfs_buf_rele: bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock

    And that is the AGI buffer from cold cache read into memory to
    transaction abort. You can see at transaction abort the bli is dirty
    and only has a single reference. The item is not pinned, and it's
    not in the AIL. Hence the only reference to it is this transaction.

    The problem is that the xfs_buf_item_unlock() call is dropping the
    last reference to the xfs_buf_log_item attached to the buffer (which
    holds a reference to the buffer), but it is not freeing the
    xfs_buf_log_item. Hence nothing will ever release the buffer, and
    the unmount hangs waiting for this reference to go away.

    The fix is simple - xfs_buf_item_unlock needs to detect the last
    reference going away in this case and free the xfs_buf_log_item to
    release the reference it holds on the buffer.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • There is a window on small filesytsems where specualtive
    preallocation can be larger than that ENOSPC throttling thresholds,
    resulting in specualtive preallocation trying to reserve more space
    than there is space available. This causes immediate ENOSPC to be
    triggered, prealloc to be turned off and flushing to occur. One the
    next write (i.e. next 4k page), we do exactly the same thing, and so
    effective drive into synchronous 4k writes by triggering ENOSPC
    flushing on every page while in the window between the prealloc size
    and the ENOSPC prealloc throttle threshold.

    Fix this by checking to see if the prealloc size would consume all
    free space, and throttle it appropriately to avoid premature
    ENOSPC...

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • When _xfs_buf_find is passed an out of range address, it will fail
    to find a relevant struct xfs_perag and oops with a null
    dereference. This can happen when trying to walk a filesystem with a
    metadata inode that has a partially corrupted extent map (i.e. the
    block number returned is corrupt, but is otherwise intact) and we
    try to read from the corrupted block address.

    In this case, just fail the lookup. If it is readahead being issued,
    it will simply not be done, but if it is real read that fails we
    will get an error being reported. Ideally this case should result
    in an EFSCORRUPTED error being reported, but we cannot return an
    error through xfs_buf_read() or xfs_buf_get() so this lookup failure
    may result in ENOMEM or EIO errors being reported instead.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Dave Chinner
     
  • The stack_switch check currently occurs in __xfs_bmapi_allocate,
    which means the stack switch only occurs when xfs_bmapi_allocate()
    is called in a loop. Pull the check up before the loop in
    xfs_bmapi_write() such that the first iteration of the loop has
    consistent behavior.

    Signed-off-by: Brian Foster
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers

    Brian Foster
     
  • 9802182 changed the return value from EWRONGFS (aka EINVAL)
    to EFSCORRUPTED which doesn't seem to be handled properly by
    the root filesystem probe.

    Signed-off-by: Eric Sandeen
    Tested-by: Sergei Trofimovich
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    Eric Sandeen
     

28 Jan, 2013

5 commits

  • The recent commit fb6791d100d1bba20b5cdbc4912e1f7086ec60f8
    included the wrong logic. The lvbptr check was incorrectly
    added after the patch was tested.

    Signed-off-by: David Teigland
    Signed-off-by: Steven Whitehouse

    David Teigland
     
  • We do need to start the lease recovery thread prior to waiting for the
    client initialisation to complete in NFSv4.1.

    Signed-off-by: Trond Myklebust
    Cc: Chuck Lever
    Cc: Ben Greear
    Cc: stable@vger.kernel.org [>=3.7]

    Trond Myklebust
     
  • If walking the list in nfs4[01]_walk_client_list fails, then the most
    likely explanation is that the server dropped the clientid before we
    actually managed to confirm it. As long as our nfs_client is the very
    last one in the list to be tested, the caller can be assured that this
    is the case when the final return value is NFS4ERR_STALE_CLIENTID.

    Reported-by: Ben Greear
    Signed-off-by: Trond Myklebust
    Cc: Chuck Lever
    Cc: stable@vger.kernel.org [>=3.7]
    Tested-by: Ben Greear

    Trond Myklebust
     
  • The reference counting in nfs4_init_client assumes wongly that it
    is safe for nfs4_discover_server_trunking() to return a pointer to a
    nfs_client prior to bumping the reference count.

    Signed-off-by: Trond Myklebust
    Cc: Chuck Lever
    Cc: Ben Greear
    Cc: stable@vger.kernel.org [>=3.7]

    Trond Myklebust
     
  • Currently, nfs_xdev_mount converts all errors from clone_server() to
    ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that.
    Also ensure that if nfs_fs_mount_common() returns an error, we
    don't dprintk(0)...

    The regression originated in commit 3d176e3fe4f6dc379b252bf43e2e146a8f7caf01
    (NFS: Use nfs_fs_mount_common() for xdev mounts)

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>= 3.5]

    Trond Myklebust
     

26 Jan, 2013

1 commit

  • Pull btrfs fixes from Chris Mason:
    "It turns out that we had two crc bugs when running fsx-linux in a
    loop. Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it
    all down. Miao also has a new OOM fix in this v2 pull as well.

    Ilya fixed a regression Liu Bo found in the balance ioctls for pausing
    and resuming a running balance across drives.

    Josef's orphan truncate patch fixes an obscure corruption we'd see
    during xfstests.

    Arne's patches address problems with subvolume quotas. If the user
    destroys quota groups incorrectly the FS will refuse to mount.

    The rest are smaller fixes and plugs for memory leaks."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (30 commits)
    Btrfs: fix repeated delalloc work allocation
    Btrfs: fix wrong max device number for single profile
    Btrfs: fix missed transaction->aborted check
    Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
    Btrfs: put csums on the right ordered extent
    Btrfs: use right range to find checksum for compressed extents
    Btrfs: fix panic when recovering tree log
    Btrfs: do not allow logged extents to be merged or removed
    Btrfs: fix a regression in balance usage filter
    Btrfs: prevent qgroup destroy when there are still relations
    Btrfs: ignore orphan qgroup relations
    Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
    Btrfs: fix unlock order in btrfs_ioctl_rm_dev
    Btrfs: fix unlock order in btrfs_ioctl_resize
    Btrfs: fix "mutually exclusive op is running" error code
    Btrfs: bring back balance pause/resume logic
    btrfs: update timestamps on truncate()
    btrfs: fix btrfs_cont_expand() freeing IS_ERR em
    Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
    Btrfs: fix off-by-one in lseek
    ...

    Linus Torvalds
     

25 Jan, 2013

9 commits

  • Pull cifs fixes from Steve French:
    "Two small cifs fixes"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    fs/cifs/cifs_dfs_ref.c: fix potential memory leakage
    cifs: fix srcip_matches() for ipv6

    Linus Torvalds
     
  • btrfs_start_delalloc_inodes() locks the delalloc_inodes list, fetches the
    first inode, unlocks the list, triggers btrfs_alloc_delalloc_work/
    btrfs_queue_worker for this inode, and then it locks the list, checks the
    head of the list again. But because we don't delete the first inode that it
    deals with before, it will fetch the same inode. As a result, this function
    allocates a huge amount of btrfs_delalloc_work structures, and OOM happens.

    Fix this problem by splice this delalloc list.

    Reported-by: Alex Lyakas
    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • The max device number of single profile is 1, not 0 (0 means 'as many as
    possible'). Fix it.

    Cc: Liu Bo
    Signed-off-by: Miao Xie
    Reviewed-by: Liu Bo
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • First, though the current transaction->aborted check can stop the commit early
    and avoid unnecessary operations, it is too early, and some transaction handles
    don't end, those handles may set transaction->aborted after the check.

    Second, when we commit the transaction, we will wake up some worker threads to
    flush the space cache and inode cache. Those threads also allocate some transaction
    handles and may set transaction->aborted if some serious error happens.

    So we need more check for ->aborted when committing the transaction. Fix it.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • We may access and update transaction->aborted on the different CPUs without
    lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating
    unsolicited accesses and make sure we can get the right value.

    Signed-off-by: Miao Xie
    Signed-off-by: Josef Bacik

    Miao Xie
     
  • I noticed a WARN_ON going off when adding csums because we were going over
    the amount of csum bytes that should have been allowed for an ordered
    extent. This is a leftover from when we used to hold the csums privately
    for direct io, but now we use the normal ordered sum stuff so we need to
    make sure and check if we've moved on to another extent so that the csums
    are added to the right extent. Without this we could end up with csums for
    bytenrs that don't have extents to cover them yet. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • For compressed extents, the range of checksum is covered by disk length,
    and the disk length is different with ram length, so we need to use disk
    length instead to get us the right checksum.

    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik

    Liu Bo
     
  • A user reported a BUG_ON(ret) that occured during tree log replay. Ret was
    -EAGAIN, so what I think happened is that we removed an extent that covered
    a bitmap entry and an extent entry. We remove the part from the bitmap and
    return -EAGAIN and then search for the next piece we want to remove, which
    happens to be an entire extent entry, so we just free the sucker and return.
    The problem is ret is still set to -EAGAIN so we trip the BUG_ON(). The
    user used btrfs-zero-log so I'm not 100% sure this is what happened so I've
    added a WARN_ON() to catch the other possibility. Thanks,

    Reported-by: Jan Steffens
    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • We drop the extent map tree lock while we're logging extents, so somebody
    could come in and merge another extent into this one and screw up our
    logging, or they could even remove us from the list which would keep us from
    logging the extent or freeing our ref on it, so we need to make sure to not
    clear LOGGING until after the extent is logged, and then we can merge it to
    adjacent extents. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik