24 Feb, 2012

1 commit


23 Feb, 2012

1 commit


17 Feb, 2012

1 commit


15 Feb, 2012

1 commit

  • A user reported a bug of btrfs's trim, that is we will trim 0 bytes
    after a device delete.

    The reproducer:

    $ mkfs.btrfs disk1
    $ mkfs.btrfs disk2
    $ mount disk1 /mnt
    $ fstrim -v /mnt
    $ btrfs device add disk2 /mnt
    $ btrfs device del disk1 /mnt
    $ fstrim -v /mnt

    This is because after we delete the device, the block group may start from
    a non-zero place, which will confuse trim to discard nothing.

    Reported-by: Lutz Euler
    Signed-off-by: Liu Bo

    Liu Bo
     

27 Jan, 2012

1 commit

  • When we did sysbench test for inline files, enospc error happened easily though
    there was lots of free disk space which could be allocated for new chunks.

    Reproduce steps:
    # mkfs.btrfs -b $((2 * 1024 * 1024 * 1024))
    # mount /mnt
    # ulimit -n 102400
    # cd /mnt
    # sysbench --num-threads=1 --test=fileio --file-num=81920 \
    > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
    > --file-test-mode=seqwr prepare
    # sysbench --num-threads=1 --test=fileio --file-num=81920 \
    > --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
    > --file-test-mode=seqwr run

    The reason of this bug is:
    Now, we can reserve space which is larger than the free space in the chunks if
    we have enough free disk space which can be used for new chunks. By this way,
    the space allocator should allocate a new chunk by force if there is no free
    space in the free space cache. But there are two wrong checks which break this
    operation.

    One is
    if (ret == -ENOSPC && num_bytes > min_alloc_size)
    in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
    even we fail to allocate free space by minimum allocable size.

    The other is
    if (space_info->force_alloc)
    force = space_info->force_alloc;
    in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
    sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.

    Fix these two wrong checks. Especially the second one, we fix it by changing
    the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
    CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
    higher priority. And if the value which is passed in by the caller is greater
    than ->force_alloc, use the passed value.

    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     

17 Jan, 2012

13 commits


11 Jan, 2012

2 commits

  • A bug was triggered while using seed device:

    # mkfs.btrfs /dev/loop1
    # btrfstune -S 1 /dev/loop1
    # mount -o /dev/loop1 /mnt
    # btrfs dev add /dev/loop2 /mnt

    btrfs: block rsv returned -28
    ------------[ cut here ]------------
    WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]()
    ...
    Call Trace:
    ...
    [] btrfs_cow_block+0x101/0x147 [btrfs]
    [] btrfs_search_slot+0x1b8/0x55f [btrfs]
    [] btrfs_insert_empty_items+0x42/0x7f [btrfs]
    [] btrfs_insert_item+0x40/0x7e [btrfs]
    [] btrfs_make_block_group+0x243/0x2aa [btrfs]
    [] __btrfs_alloc_chunk+0x672/0x70e [btrfs]
    [] init_first_rw_device+0x77/0x13c [btrfs]
    [] btrfs_init_new_device+0x664/0x9fd [btrfs]
    [] btrfs_ioctl+0x694/0xdbe [btrfs]
    [] do_vfs_ioctl+0x496/0x4cc
    [] sys_ioctl+0x33/0x4f
    [] sysenter_do_call+0x12/0x38
    ---[ end trace 906adac595facc7d ]---

    Since seed device is readonly, there's no usable space in the filesystem.
    Afterwards we add a sprout device to it, and the kernel creates a METADATA
    block group and a SYSTEM block group where comes free space we can reserve,
    but we still get revervation failure because the global block_rsv hasn't
    been updated accordingly.

    Signed-off-by: Li Zefan

    Li Zefan
     
  • Some functions never use the transaction handle passed to them.

    Signed-off-by: Li Zefan

    Li Zefan
     

08 Jan, 2012

1 commit

  • We store the allocation start and length twice in ins, once right
    after the other, but with intervening calls that may prevent the
    duplicate from being optimized out by the compiler. Remove one of the
    assignments.

    Signed-off-by: Alexandre Oliva
    Signed-off-by: Chris Mason

    Alexandre Oliva
     

07 Jan, 2012

3 commits

  • Since the clustered allocation may be taking extents from a different
    block group, there's no point in spin-locking and testing the current
    block group free space before attempting to allocate space from a
    cluster, even more so when we might refrain from even trying the
    cluster in the current block group because, after the cluster was set
    up, not enough free space remained. Furthermore, cluster creation
    attempts fail fast when the block group doesn't have enough free
    space, so the test was completely superfluous.

    I've move the free space test past the cluster allocation attempt,
    where it is more useful, and arranged for a cluster in the current
    block group to be released before trying an unclustered allocation,
    when we reach the LOOP_NO_EMPTY_SIZE stage, so that the free space in
    the cluster stands a chance of being combined with additional free
    space in the block group so as to succeed in the allocation attempt.

    Signed-off-by: Alexandre Oliva
    Signed-off-by: Chris Mason

    Alexandre Oliva
     
  • The chunk allocation code has tried to keep a pretty tight lid on creating new
    metadata chunks. This is partially because in the past the reservation
    code didn't give us an accurate idea of how much space was being used.

    The new code is much more accurate, so we're able to get rid of some of these
    checks.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • Btrfs tries to batch extent allocation tree changes to improve performance
    and reduce metadata trashing. But it doesn't allocate new metadata chunks
    while it is doing allocations for the extent allocation tree.

    This commit changes the delayed refence code to do chunk allocations if we're
    getting low on room. It prevents crashes and improves performance.

    Signed-off-by: Chris Mason

    Chris Mason
     

04 Jan, 2012

2 commits

  • Now that we may be holding back delayed refs for a limited period, we
    might end up having no runnable delayed refs. Without this commit, we'd
    do busy waiting in that thread until another (runnable) ref arives.
    Instead, we're detecting this situation and use a waitqueue, such that
    we only try to run more refs after
    a) another runnable ref was added or
    b) delayed refs are no longer held back

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     
  • When processing a delayed ref, first check if there are still old refs in
    the process of being added. If so, put this ref back to the tree. To avoid
    looping on this ref, choose a newer one in the next loop.
    btrfs_find_ref_cluster has to take care of that.

    Signed-off-by: Arne Jansen
    Signed-off-by: Jan Schmidt

    Arne Jansen
     

22 Dec, 2011

1 commit

  • Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
    from every call site. The for_cow parameter will later on be used to
    determine if a ref will change anything with respect to qgroups.

    Delayed refs coming from relocation are always counted as for_cow, as they
    don't change subvol quota.

    Also pass in the fs_info for later use.

    btrfs_find_all_roots() will use this as an optimization, as changes that are
    for_cow will not change anything with respect to which root points to a
    certain leaf. Thus, we don't need to add the current sequence number to
    those delayed refs.

    Signed-off-by: Arne Jansen
    Signed-off-by: Jan Schmidt

    Arne Jansen
     

17 Dec, 2011

1 commit

  • …inux/kernel/git/mason/linux-btrfs

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: unplug every once and a while
    Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code
    Btrfs: only set cache_generation if we setup the block group
    Btrfs: don't panic if orphan item already exists
    Btrfs: fix leaked space in truncate
    Btrfs: fix how we do delalloc reservations and how we free reservations on error
    Btrfs: deal with enospc from dirtying inodes properly
    Btrfs: fix num_workers_starting bug and other bugs in async thread
    BTRFS: Establish i_ops before calling d_instantiate
    Btrfs: add a cond_resched() into the worker loop
    Btrfs: fix ctime update of on-disk inode
    btrfs: keep orphans for subvolume deletion
    Btrfs: fix inaccurate available space on raid0 profile
    Btrfs: fix wrong disk space information of the files
    Btrfs: fix wrong i_size when truncating a file to a larger size
    Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror

    * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    btrfs: lower the dirty balance poll interval

    Linus Torvalds
     

16 Dec, 2011

2 commits

  • A user reported a problem booting into a new kernel with the old format inodes.
    He was panicing in cow_file_range while writing out the inode cache. This is
    because if the block group is not cached we'll just skip writing out the cache,
    however if it gets dirtied again in the same transaction and it finished caching
    we'd go ahead and write it out, but since we set cache_generation to the transid
    we think we've already truncated it and will just carry on, running into
    cow_file_range and blowing up. We need to make sure we only set
    cache_generation if we've done the truncate. The user tested this patch and
    verified that the panic no longer occured. Thanks,

    Reported-and-Tested-by: Klaus Bitto
    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Running xfstests 269 with some tracing my scripts kept spitting out errors about
    releasing bytes that we didn't actually have reserved. This took me down a huge
    rabbit hole and it turns out the way we deal with reserved_extents is wrong,
    we need to only be setting it if the reservation succeeds, otherwise the free()
    method will come in and unreserve space that isn't actually reserved yet, which
    can lead to other warnings and such. The math was all working out right in the
    end, but it caused all sorts of other issues in addition to making my scripts
    yell and scream and generally make it impossible for me to track down the
    original issue I was looking for. The other problem is with our error handling
    in the reservation code. There are two cases that we need to deal with

    1) We raced with free. In this case free won't free anything because csum_bytes
    is modified before we dro the lock in our reservation path, so free rightly
    doesn't release any space because the reservation code may be depending on that
    reservation. However if we fail, we need the reservation side to do the free at
    that point since that space is no longer in use. So as it stands the code was
    doing this fine and it worked out, except in case #2

    2) We don't race with free. Nobody comes in and changes anything, and our
    reservation fails. In this case we didn't reserve anything anyway and we just
    need to clean up csum_bytes but not free anything. So we keep track of
    csum_bytes before we drop the lock and if it hasn't changed we know we can just
    decrement csum_bytes and carry on.

    Because of the case where we can race with free()'s since we have to drop our
    spin_lock to do the reservation, I'm going to serialize all reservations with
    the i_mutex. We already get this for free in the heavy use paths, truncate and
    file write all hold the i_mutex, just needed to add it to page_mkwrite and
    various ioctl/balance things. With this patch my space leak scripts no longer
    scream bloody murder. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

09 Dec, 2011

1 commit


08 Dec, 2011

2 commits

  • When we find an existing cluster, we switch to its block group as the
    current block group, possibly skipping multiple blocks in the process.
    Furthermore, under heavy contention, multiple threads may fail to
    allocate from a cluster and then release just-created clusters just to
    proceed to create new ones in a different block group.

    This patch tries to allocate from an existing cluster regardless of its
    block group, and doesn't switch to that group, instead proceeding to
    try to allocate a cluster from the group it was iterating before the
    attempt.

    Signed-off-by: Alexandre Oliva
    Signed-off-by: Chris Mason

    Alexandre Oliva
     
  • If we reach LOOP_NO_EMPTY_SIZE, we won't even try to use a cluster that
    others might have set up. Odds are that there won't be one, but if
    someone else succeeded in setting it up, we might as well use it, even
    if we don't try to set up a cluster again.

    Signed-off-by: Alexandre Oliva
    Signed-off-by: Chris Mason

    Alexandre Oliva
     

02 Dec, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix meta data raid-repair merge problem
    Btrfs: skip allocation attempt from empty cluster
    Btrfs: skip block groups without enough space for a cluster
    Btrfs: start search for new cluster at the beginning
    Btrfs: reset cluster's max_size when creating bitmap
    Btrfs: initialize new bitmaps' list
    Btrfs: fix oops when calling statfs on readonly device
    Btrfs: Don't error on resizing FS to same size
    Btrfs: fix deadlock on metadata reservation when evicting a inode
    Fix URL of btrfs-progs git repository in docs
    btrfs scrub: handle -ENOMEM from init_ipath()

    Linus Torvalds
     

01 Dec, 2011

4 commits


23 Nov, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: remove free-space-cache.c WARN during log replay
    Btrfs: sectorsize align offsets in fiemap
    Btrfs: clear pages dirty for io and set them extent mapped
    Btrfs: wait on caching if we're loading the free space cache
    Btrfs: prefix resize related printks with btrfs:
    btrfs: fix stat blocks accounting
    Btrfs: avoid unnecessary bitmap search for cluster setup
    Btrfs: fix to search one more bitmap for cluster setup
    btrfs: mirror_num should be int, not u64
    btrfs: Fix up 32/64-bit compatibility for new ioctls
    Btrfs: fix barrier flushes
    Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush

    Linus Torvalds
     

20 Nov, 2011

1 commit

  • We've been hitting panics when running xfstest 13 in a loop for long periods of
    time. And actually this problem has always existed so we've been hitting these
    things randomly for a while. Basically what happens is we get a thread coming
    into the allocator and reading the space cache off of disk and adding the
    entries to the free space cache as we go. Then we get another thread that comes
    in and tries to allocate from that block group. Since block_group->cached !=
    BTRFS_CACHE_NO it goes ahead and tries to do the allocation. We do this because
    if we're doing the old slow way of caching we don't want to hold people up and
    wait for everything to finish. The problem with this is we could end up
    discarding the space cache at some arbitrary point in the future, which means we
    could very well end up allocating space that is either bad, or when the real
    caching happens it could end up thinking the space isn't in use when it really
    is and cause all sorts of other problems.

    The solution is to add a new flag to indicate we are loading the free space
    cache from disk, and always try to cache the block group if cache->cached !=
    BTRFS_CACHE_FINISHED. That way if we are loading the space cache anybody else
    who tries to allocate from the block group will have to wait until it's finished
    to make sure it completes successfully. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik