29 Mar, 2012

6 commits

  • When we use autodefrag, we forget to update the index which indicates
    the last page we've dirty. And we'll set dirty flags on a same set of
    pages again and again.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • $ mkfs.btrfs /dev/sdb7
    $ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
    $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
    $ filefrag -v /mnt/btrfs/foobar
    Filesystem type is: 9123683e
    File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
    ext logical physical expected length flags
    0 0 3072 10 eof
    /mnt/btrfs/foobar: 1 extent found

    Now we have a big real extent [0, 40960), but autodefrag will still defrag it.

    $ sync
    $ filefrag -v /mnt/btrfs/foobar
    Filesystem type is: 9123683e
    File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
    ext logical physical expected length flags
    0 0 3082 10 eof
    /mnt/btrfs/foobar: 1 extent found

    So if we already find a big real extent, we're ok about that, just skip it.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • If our file's layout is as follows:
    | hole | data1 | hole | data2 |

    we do not need to defrag this file, because this file has holes and
    cannot be merged into one extent.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     
  • commit 600a45e1d5e376f679ff9ecc4ce9452710a6d27c
    (Btrfs: fix deadlock on page lock when doing auto-defragment)
    fixes the deadlock on page, but it also introduces another bug.

    A page may have been truncated after unlock & lock.
    So we need to find it again to get the right one.

    And since we've held i_mutex lock, inode size remains unchanged and
    we can drop isize overflow checks.

    Signed-off-by: Liu Bo
    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Liu Bo
     
  • The bug is from running xfstests 209 with autodefrag.

    The race is as follows:
    t1 t2(autodefrag)
    direct IO
    invalidate pagecache
    dio(old data) add_inode_defrag
    invalidate pagecache
    endio

    direct IO
    invalidate pagecache
    run_defrag
    readpage(old data)
    set page dirty (old data)
    dio(new data, rewrite)
    invalidate pagecache (*)
    endio

    t2(autodefrag) will get old data into pagecache via readpage and set
    pagecache dirty. Meanwhile, invalidate pagecache(*) will fail due to
    dirty flags in pages. So the old data may be flushed into disk by
    flush thread, which will lead to data loss.

    And so does the case of user defragment progs.

    The patch fixes this race by holding i_mutex when we readpage and set page dirty.

    Signed-off-by: Liu Bo
    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Liu Bo
     
  • Conflicts:
    fs/btrfs/transaction.c

    Signed-off-by: Chris Mason

    Chris Mason
     

27 Mar, 2012

1 commit

  • In commit 4692cf58 we introduced new backref walking code for btrfs. This
    assumes we're searching live roots, which requires a transaction context.
    While scrubbing, however, we must not join a transaction because this could
    deadlock with the commit path. Additionally, what scrub really wants to do
    is resolving a logical address in the commit root it's currently checking.

    This patch adds support for logical to path resolving on commit roots and
    makes scrub use that.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     

22 Mar, 2012

3 commits


25 Feb, 2012

1 commit

  • Quoth Chris:
    "This is later than I wanted because I got backed up running through
    btrfs bugs from the Oracle QA teams. But they are all bug fixes that
    we've queued and tested since rc1.

    Nothing in particular stands out, this just reflects bug fixing and QA
    done in parallel by all the btrfs developers. The most user visible
    of these is:

    Btrfs: clear the extent uptodate bits during parent transid failures

    Because that helps deal with out of date drives (say an iscsi disk
    that has gone away and come back). The old code wasn't always
    properly retrying the other mirror for this type of failure."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
    Btrfs: fix compiler warnings on 32 bit systems
    Btrfs: increase the global block reserve estimates
    Btrfs: clear the extent uptodate bits during parent transid failures
    Btrfs: add extra sanity checks on the path names in btrfs_mksubvol
    Btrfs: make sure we update latest_bdev
    Btrfs: improve error handling for btrfs_insert_dir_item callers
    Btrfs: be less strict on finding next node in clear_extent_bit
    Btrfs: fix a bug on overcommit stuff
    Btrfs: kick out redundant stuff in convert_extent_bit
    Btrfs: skip states when they does not contain bits to clear
    Btrfs: check return value of lookup_extent_mapping() correctly
    Btrfs: fix deadlock on page lock when doing auto-defragment
    Btrfs: fix return value check of extent_io_ops
    btrfs: honor umask when creating subvol root
    btrfs: silence warning in raid array setup
    btrfs: fix structs where bitfields and spinlock/atomic share 8B word
    btrfs: delalloc for page dirtied out-of-band in fixup worker
    Btrfs: fix memory leak in load_free_space_cache()
    btrfs: don't check DUP chunks twice
    Btrfs: fix trim 0 bytes after a device delete
    ...

    Linus Torvalds
     

23 Feb, 2012

1 commit


17 Feb, 2012

1 commit

  • When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
    happened.

    Steps to reproduce:
    [tty0]
    # export MOUNT_OPTIONS="-o autodefrag"
    # export TEST_DEV=
    # export TEST_DIR=
    # export SCRATCH_DEV=
    # export SCRATCH_MNT=
    # while [ 1 ]
    > do
    > ./check 091 127 263
    > sleep 1
    > done
    [tty1]
    # while [ 1 ]
    > do
    > echo 3 > /proc/sys/vm/drop_caches
    > done

    Several hours later, the test processes will hang on, and the deadlock will
    happen on page lock.

    The reason is that:
    Auto defrag task Flush thread Test task
    btrfs_writepages()
    add ordered extent
    (including page 1, 2)
    set page 1 writeback
    set page 2 writeback
    endio_fn()
    end page 2 writeback
    release page 2
    lock page 1
    alloc and lock page 2
    page 2 is not uptodate
    btrfs_readpage()
    start ordered extent()
    btrfs_writepages()
    try to lock page 1

    so deadlock happens.

    Fix this bug by unlocking the page which is in writeback, and re-locking it
    after the writeback end.

    Signed-off-by: Miao Xie

    Miao Xie
     

29 Jan, 2012

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix reservations in btrfs_page_mkwrite
    Btrfs: advance window_start if we're using a bitmap
    btrfs: mask out gfp flags in releasepage
    Btrfs: fix enospc error caused by wrong checks of the chunk
    Btrfs: do not defrag a file partially
    Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
    Btrfs: use cluster->window_start when allocating from a cluster bitmap
    Btrfs: Check for NULL page in extent_range_uptodate
    btrfs: Fix busyloops in transaction waiting code
    Btrfs: make sure a bitmap has enough bytes
    Btrfs: fix uninit warning in backref.c

    Linus Torvalds
     

27 Jan, 2012

1 commit

  • xfstests 218 complains that btrfs defrags a file partially:
    After: 1
    Write backwards sync, but contiguous - should defrag to 1 extent
    Before: 10
    -After: 1
    +After: 2

    To fix this, we need to set max_to_defrag count properly.

    Signed-off-by: Liu Bo
    Signed-off-by: Chris Mason

    Liu Bo
     

18 Jan, 2012

2 commits

  • * 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    btrfs: take allocation of ->tree_root into open_ctree()
    btrfs: let ->s_fs_info point to fs_info, not root...
    btrfs: consolidate failure exits in btrfs_mount() a bit
    btrfs: make free_fs_info() call ->kill_sb() unconditional
    btrfs: merge free_fs_info() calls on fill_super failures
    btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
    btrfs: make open_ctree() return int
    btrfs: sanitizing ->fs_info, part 5
    btrfs: sanitizing ->fs_info, part 4
    btrfs: sanitizing ->fs_info, part 3
    btrfs: sanitizing ->fs_info, part 2
    btrfs: sanitizing ->fs_info, part 1
    btrfs: fix a deadlock in btrfs_scan_one_device()
    btrfs: fix mount/umount race
    btrfs: get ->kill_sb() of its own
    btrfs: preparation to fixing mount/umount race

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
    Btrfs: use larger system chunks
    Btrfs: add a delalloc mutex to inodes for delalloc reservations
    Btrfs: space leak tracepoints
    Btrfs: protect orphan block rsv with spin_lock
    Btrfs: add allocator tracepoints
    Btrfs: don't call btrfs_throttle in file write
    Btrfs: release space on error in page_mkwrite
    Btrfs: fix btrfsck error 400 when truncating a compressed
    Btrfs: do not use btrfs_end_transaction_throttle everywhere
    Btrfs: add balance progress reporting
    Btrfs: allow for resuming restriper after it was paused
    Btrfs: allow for canceling restriper
    Btrfs: allow for pausing restriper
    Btrfs: add skip_balance mount option
    Btrfs: recover balance on mount
    Btrfs: save balance parameters to disk
    Btrfs: soft profile changing mode (aka soft convert)
    Btrfs: implement online profile changing
    Btrfs: do not reduce profile in do_chunk_alloc()
    Btrfs: virtual address space subset filter
    ...

    Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
    mnt_drop_write_file() helper.

    Linus Torvalds
     

17 Jan, 2012

9 commits


11 Jan, 2012

2 commits


09 Jan, 2012

1 commit


05 Jan, 2012

1 commit

  • The old backref iteration code could only safely be used on commit roots.
    Besides this limitation, it had bugs in finding the roots for these
    references. This commit replaces large parts of it by btrfs_find_all_roots()
    which a) really finds all roots and the correct roots, b) works correctly
    under heavy file system load, c) considers delayed refs.

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     

04 Jan, 2012

2 commits


22 Dec, 2011

1 commit

  • Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
    from every call site. The for_cow parameter will later on be used to
    determine if a ref will change anything with respect to qgroups.

    Delayed refs coming from relocation are always counted as for_cow, as they
    don't change subvol quota.

    Also pass in the fs_info for later use.

    btrfs_find_all_roots() will use this as an optimization, as changes that are
    for_cow will not change anything with respect to which root points to a
    certain leaf. Thus, we don't need to add the current sequence number to
    those delayed refs.

    Signed-off-by: Arne Jansen
    Signed-off-by: Jan Schmidt

    Arne Jansen
     

16 Dec, 2011

2 commits

  • …/btrfs-work into integration

    Conflicts:
    fs/btrfs/inode.c

    Signed-off-by: Chris Mason <chris.mason@oracle.com>

    Chris Mason
     
  • Running xfstests 269 with some tracing my scripts kept spitting out errors about
    releasing bytes that we didn't actually have reserved. This took me down a huge
    rabbit hole and it turns out the way we deal with reserved_extents is wrong,
    we need to only be setting it if the reservation succeeds, otherwise the free()
    method will come in and unreserve space that isn't actually reserved yet, which
    can lead to other warnings and such. The math was all working out right in the
    end, but it caused all sorts of other issues in addition to making my scripts
    yell and scream and generally make it impossible for me to track down the
    original issue I was looking for. The other problem is with our error handling
    in the reservation code. There are two cases that we need to deal with

    1) We raced with free. In this case free won't free anything because csum_bytes
    is modified before we dro the lock in our reservation path, so free rightly
    doesn't release any space because the reservation code may be depending on that
    reservation. However if we fail, we need the reservation side to do the free at
    that point since that space is no longer in use. So as it stands the code was
    doing this fine and it worked out, except in case #2

    2) We don't race with free. Nobody comes in and changes anything, and our
    reservation fails. In this case we didn't reserve anything anyway and we just
    need to clean up csum_bytes but not free anything. So we keep track of
    csum_bytes before we drop the lock and if it hasn't changed we know we can just
    decrement csum_bytes and carry on.

    Because of the case where we can race with free()'s since we have to drop our
    spin_lock to do the reservation, I'm going to serialize all reservations with
    the i_mutex. We already get this for free in the heavy use paths, truncate and
    file write all hold the i_mutex, just needed to add it to page_mkwrite and
    various ioctl/balance things. With this patch my space leak scripts no longer
    scream bloody murder. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

15 Dec, 2011

1 commit

  • To reproduce the bug:

    # touch /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800
    # chattr +i /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:43.198105295 +0800
    # umount /mnt
    # mount /dev/loop1 /mnt
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800

    We should update ctime of in-memory inode before calling
    btrfs_update_inode().

    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     

01 Dec, 2011

1 commit


20 Nov, 2011

2 commits

  • For the user it is confusing to find something like:
    [10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
    in kernel log, because it doesn't point directly to btrfs.

    This patch prefixes those messages with "btrfs:" like other btrfs
    related printks.

    Signed-off-by: Arnd Hannemann
    Signed-off-by: Chris Mason

    Arnd Hannemann
     
  • This patch casts to unsigned long before casting to a pointer and fixes
    the following warnings:
    fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Chris Mason

    Jeff Mahoney
     

06 Nov, 2011

1 commit