09 Oct, 2012

3 commits

  • If a filesystem is mounted with compression and then remounted by adding nodatacow,
    the compression is disabled but the compress flag is still visible.
    Also, if a filesystem is mounted with nodatacow and then remounted with compression,
    nodatacow flag is still present but it's not active.
    This patch:
    - removes compress flags and notifies that the compression has been disabled if the
    filesystem is mounted with nodatacow
    - removes nodatacow and nodatasum flags if mounted with compress.

    Signed-off-by: Andrei Popa

    Andrei Popa
     
  • Fix various messages to include newline and module prefix.

    Signed-off-by: Daniel J Blueman

    Daniel J Blueman
     
  • With the following debug patch:

    static int btrfs_freeze(struct super_block *sb)
    {
    + struct btrfs_fs_info *fs_info = btrfs_sb(sb);
    + struct btrfs_transaction *trans;
    +
    + spin_lock(&fs_info->trans_lock);
    + trans = fs_info->running_transaction;
    + if (trans) {
    + printk("Transid %llu, use_count %d, num_writer %d\n",
    + trans->transid, atomic_read(&trans->use_count),
    + atomic_read(&trans->num_writers));
    + }
    + spin_unlock(&fs_info->trans_lock);
    return 0;
    }

    I found there was a orphan transaction after the freeze operation was done.

    It is because the transaction may not be committed when the transaction handle
    end even though it is the last handle of the current transaction. This design
    avoid committing the transaction frequently, but also introduce the above
    problem.

    So I add btrfs_attach_transaction() which can catch the current transaction
    and commit it. If there is no transaction, it will return ENOENT, and do not
    anything.

    This function also can be used to instead of btrfs_join_transaction_freeze()
    because it don't increase the writer counter and don't start a new transaction,
    so it also can fix the deadlock between sync and freeze.

    Besides that, it is used to instead of btrfs_join_transaction() in
    transaction_kthread(), because if there is no transaction, the transaction
    kthread needn't anything.

    Signed-off-by: Miao Xie

    Miao Xie
     

04 Oct, 2012

3 commits


02 Oct, 2012

2 commits

  • Though we dump the stack information when aborting a unused transaction
    handle, we don't know the correct place where we decide to abort the
    transaction handle if one function has several place where the transaction
    abort function is invoked and jumps to the same place after this call.
    And beside that we also don't know the reason why we jump to abort
    the current handle. So I modify the transaction abort function and make
    it output the function name, line and error information.

    Signed-off-by: Miao Xie

    Miao Xie
     
  • The ordered extent allocation is in the fast path of the IO, so use a slab
    to improve the speed of the allocation.

    "Size of the struct is 280, so this will fall into the size-512 bucket,
    giving 8 objects per page, while own slab will pack 14 objects into a page.

    Another benefit I see is to check for leaked objects when the module is
    removed (and the cache destroy takes place)."
    -- David Sterba

    Signed-off-by: Miao Xie

    Miao Xie
     

30 Aug, 2012

1 commit

  • Pull btrfs fixes from Chris Mason:
    "I've split out the big send/receive update from my last pull request
    and now have just the fixes in my for-linus branch. The send/recv
    branch will wander over to linux-next shortly though.

    The largest patches in this pull are Josef's patches to fix DIO
    locking problems and his patch to fix a crash during balance. They
    are both well tested.

    The rest are smaller fixes that we've had queued. The last rc came
    out while I was hacking new and exciting ways to recover from a
    misplaced rm -rf on my dev box, so these missed rc3."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
    Btrfs: fix that repair code is spuriously executed for transid failures
    Btrfs: fix ordered extent leak when failing to start a transaction
    Btrfs: fix a dio write regression
    Btrfs: fix deadlock with freeze and sync V2
    Btrfs: revert checksum error statistic which can cause a BUG()
    Btrfs: remove superblock writing after fatal error
    Btrfs: allow delayed refs to be merged
    Btrfs: fix enospc problems when deleting a subvol
    Btrfs: fix wrong mtime and ctime when creating snapshots
    Btrfs: fix race in run_clustered_refs
    Btrfs: don't run __tree_mod_log_free_eb on leaves
    Btrfs: increase the size of the free space cache
    Btrfs: barrier before waitqueue_active
    Btrfs: fix deadlock in wait_for_more_refs
    btrfs: fix second lock in btrfs_delete_delayed_items()
    Btrfs: don't allocate a seperate csums array for direct reads
    Btrfs: do not strdup non existent strings
    Btrfs: do not use missing devices when showing devname
    Btrfs: fix that error value is changed by mistake
    Btrfs: lock extents as we map them in DIO
    ...

    Linus Torvalds
     

29 Aug, 2012

2 commits

  • We can deadlock with freeze right now because we unconditionally start a
    transaction in our ->sync_fs() call. To fix this just check and see if we
    have a running transaction to commit. This saves us from the deadlock
    because at this point we'll have the umount sem for the sb so we're safe
    from freezes coming in after we've done our check. With this patch the
    freeze xfstests no longer deadlocks. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • If you do the following

    mkfs.btrfs /dev/sdb /dev/sdc
    rmmod btrfs
    dd if=/dev/zero of=/dev/sdb bs=1M count=1
    mount -o degraded /dev/sdc /mnt/btrfs-test

    the box will panic trying to deref the name for the missing dev since it is
    the lower numbered devid. So fix show_devname to not use missing devices.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

04 Aug, 2012

1 commit


31 Jul, 2012

1 commit

  • Use the generic printk_get_level() to search a message for a kern_level.

    Add __printf to verify format and arguments. Fix a few messages that
    had mismatches in format and arguments. Add #ifdef CONFIG_PRINTK blocks
    to shrink the object size a bit when not using printk.

    [akpm@linux-foundation.org: whitespace tweak]
    Signed-off-by: Joe Perches
    Cc: Kay Sievers
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

27 Jul, 2012

1 commit

  • Pull large btrfs update from Chris Mason:
    "This pull request is very large, and the two main features in here
    have been under testing/devel for quite a while.

    We have subvolume quotas from the strato developers. This enables
    full tracking of how many blocks are allocated to each subvolume (and
    all snapshots) and you can set limits on a per-subvolume basis. You
    can also create quota groups and toss multiple subvolumes into a big
    group. It's everything you need to be a web hosting company and give
    each user their own subvolume.

    The userland side of the quotas is being refreshed, they'll send out
    details on where to grab it soon.

    Next is the kernel side of btrfs send/receive from Alexander Block.
    This leverages the same infrastructure as the quota code to figure out
    relationships between blocks and their owners. It can then compute
    the difference between two snapshots and sends the diffs in a neutral
    format into userland.

    The basic model:

    create a snapshot
    send that snapshot as the initial backup
    make changes
    create a second snapshot
    send the incremental as a backup
    delete the first snapshot
    (use the second snapshot for the next incremental)

    The receive portion is all in userland, and in the 'next' branch of my
    btrfs-progs repo.

    There's still some work to do in terms of optimizing the send side
    from kernel to userland. The really important part is figuring out
    how two snapshots are different, and this is where we are
    concentrating right now. The initial send of a dataset is a little
    slower than tar, but the incremental sends are dramatically faster
    than what rsync can do.

    On top of all of that, we have a nice queue of fixes, cleanups and
    optimizations."

    Fix up trivial modify/del conflict in fs/btrfs/ioctl.c

    Also fix up semantic conflict in fs/btrfs/send.c: the interface to
    dentry_open() changed in commit 765927b2d508 ("switch dentry_open() to
    struct path, make it grab references itself"), and since it now grabs
    whatever references it needs, we should no longer do the mntget() on the
    mnt (and we need to dput() the dentry reference we took).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (65 commits)
    Btrfs: uninit variable fixes in send/receive
    Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive
    Btrfs: add btrfs_compare_trees function
    Btrfs: introduce subvol uuids and times
    Btrfs: make iref_to_path non static
    Btrfs: add a barrier before a waitqueue_active check
    Btrfs: call the ordered free operation without any locks held
    Btrfs: Check INCOMPAT flags on remount and add helper function
    Btrfs: add helper for tree enumeration
    btrfs: allow cross-subvolume file clone
    Btrfs: improve multi-thread buffer read
    Btrfs: make btrfs's allocation smoothly with preallocation
    Btrfs: lock the transition from dirty to writeback for an eb
    Btrfs: fix potential race in extent buffer freeing
    Btrfs: don't return true in releasepage unless we actually freed the eb
    Btrfs: suppress printk() if all device I/O stats are zero
    Btrfs: remove unwanted printk() for btrfs device I/O stats
    Btrfs: rewrite BTRFS_SETGET_FUNCS
    Btrfs: zero unused bytes in inode item
    Btrfs: kill free_space pointer from inode structure
    ...

    Conflicts:
    fs/btrfs/ioctl.c

    Linus Torvalds
     

26 Jul, 2012

1 commit

  • In support of the recently added capability to remount with lzo
    compression, provide a helper function to check the compression
    INCOMPAT flags when remounting with lzo compression, and set
    the flags if necessary.

    Also, implement the new helper function when defragmenting with
    explicit lzo compression and when setting the default subvolume.

    Signed-off-by: Mitch Harder
    Signed-off-by: Chris Mason

    Mitch Harder
     

24 Jul, 2012

3 commits

  • This will be used in conjunction with btrfs device ready . This is
    needed for initrd's to have a nice and lightweight way to tell if all of the
    devices needed for a file system are in the cache currently. This keeps
    them from having to do mount+sleep loops waiting for devices to show up.
    Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Btrfs allows to turn on compression on a mounted and used filesystem
    by issuing mount -o remount,compress=lzo.
    This patch allows to turn compression off again
    while the filesystem is mounted. As suggested by David Sterba
    if the compress-force option was set, it is implicitly cleared
    if compression is turned off.

    Tested-by: David Sterba
    Signed-off-by: Arnd Hannemann

    Arnd Hannemann
     
  • We do all of our inode updating when we change it, and now that we do
    ->update_time we don't need ->dirty_inode for atime updates anymore, so just
    remove it. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

14 Jul, 2012

1 commit

  • Pass mount flags to sget() so that it can use them in initialising a new
    superblock before the set function is called. They could also be passed to the
    compare function.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

03 Jul, 2012

1 commit


15 Jun, 2012

1 commit

  • Because btrfs can remove the device that was mounted we need to have a
    ->show_devname so that in this case we can print out some other device in
    the file system to /proc/mount. So if there are multiple devices in a btrfs
    file system we will just print the device with the lowest devid that we can
    find. This will make everything consistent and deal with device removal
    properly. The drawback is if you mount with a device that is higher than
    the lowest devicd it won't show up as the mounted device in /proc/mounts,
    but this is a small price to pay. This was inspired by Miao Xie's patch.
    Thanks,

    Reviewed-by: Miao Xie
    Signed-off-by: Josef Bacik

    Josef Bacik
     

30 May, 2012

4 commits

  • There is an off-by-one error: allocating room for a maximal result
    string but without room for a trailing NUL. That, can lead to
    returning a transformed string that is not NUL-terminated, and
    then to a caller reading beyond end of the malloc'd buffer.

    Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
    (the result is guaranteed to fit), remove dead strlen at end, and
    change a few variable names and comments.

    Reviewed-by: Josef Bacik
    Signed-off-by: Jim Meyering

    Jim Meyering
     
  • The buffer read-overrun would be triggered by a printk format
    starting with , where N is a single digit. NUL-terminate
    after strncpy. Use memcpy, not strncpy, since we know the
    string we're copying fits in the destination buffer and
    contains no NUL byte.

    Signed-off-by: Jim Meyering

    Jim Meyering
     
  • Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:

    maximum amount of worker threads is specified in 2 places:
    - in 'strict btrfs_fs_info::thread_pool_size'
    - in each worker struct: 'struct btrfs_workers::max_workers'

    'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.

    Fix it by pushing new maximum value to all created worker structures
    as well.

    Cc: Josef Bacik
    Cc: Chris Mason
    Reviewed-by: Josef Bacik
    Signed-off-by: Sergei Trofimovich

    Sergei Trofimovich
     
  • We've been keeping around the inode sequence number in hopes that somebody
    would use it, but nobody uses it and people actually use i_version which
    serves the same purpose, so use i_version where we used the incore inode's
    sequence number and that way the sequence is updated properly across the
    board, and not just in file write. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

29 Apr, 2012

1 commit

  • Pull btrfs fixes from Chris Mason:
    "This has our collection of bug fixes. I missed the last rc because I
    thought our patches were making NFS crash during my xfs test runs.
    Turns out it was an NFS client bug fixed by someone else while I tried
    to bisect it.

    All of these fixes are small, but some are fairly high impact. The
    biggest are fixes for our mount -o remount handling, a deadlock due to
    GFP_KERNEL allocations in readdir, and a RAID10 error handling bug.

    This was tested against both 3.3 and Linus' master as of this morning."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (26 commits)
    Btrfs: reduce lock contention during extent insertion
    Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir
    Btrfs: Fix space checking during fs resize
    Btrfs: fix block_rsv and space_info lock ordering
    Btrfs: Prevent root_list corruption
    Btrfs: fix repair code for RAID10
    Btrfs: do not start delalloc inodes during sync
    Btrfs: fix that check_int_data mount option was ignored
    Btrfs: don't count CRC or header errors twice while scrubbing
    Btrfs: fix btrfs_ioctl_dev_info() crash on missing device
    btrfs: don't return EINTR
    Btrfs: double unlock bug in error handling
    Btrfs: always store the mirror we read the eb from
    fs/btrfs/volumes.c: add missing free_fs_devices
    btrfs: fix early abort in 'remount'
    Btrfs: fix max chunk size check in chunk allocator
    Btrfs: add missing read locks in backref.c
    Btrfs: don't call free_extent_buffer twice in iterate_irefs
    Btrfs: Make free_ipath() deal gracefully with NULL pointers
    Btrfs: avoid possible use-after-free in clear_extent_bit()
    ...

    Linus Torvalds
     

28 Apr, 2012

1 commit

  • btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and
    start writing them out, but it doesn't splice the list or anything so as
    long as somebody is doing work on the box you could end up in this section
    _forever_. So just remove it, it's not needed anyway since sync will start
    writeback on all inodes anyway, all we need to do is wait for ordered
    extents and then we can commit the transaction. In my horrible torture test
    sync goes from taking 4 minutes to about 1.5 minutes. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

19 Apr, 2012

1 commit


31 Mar, 2012

1 commit

  • Pull btrfs fixes and features from Chris Mason:
    "We've merged in the error handling patches from SuSE. These are
    already shipping in the sles kernel, and they give btrfs the ability
    to abort transactions and go readonly on errors. It involves a lot of
    churn as they clarify BUG_ONs, and remove the ones we now properly
    deal with.

    Josef reworked the way our metadata interacts with the page cache.
    page->private now points to the btrfs extent_buffer object, which
    makes everything faster. He changed it so we write an whole extent
    buffer at a time instead of allowing individual pages to go down,,
    which will be important for the raid5/6 code (for the 3.5 merge
    window ;)

    Josef also made us more aggressive about dropping pages for metadata
    blocks that were freed due to COW. Overall, our metadata caching is
    much faster now.

    We've integrated my patch for metadata bigger than the page size.
    This allows metadata blocks up to 64KB in size. In practice 16K and
    32K seem to work best. For workloads with lots of metadata, this cuts
    down the size of the extent allocation tree dramatically and fragments
    much less.

    Scrub was updated to support the larger block sizes, which ended up
    being a fairly large change (thanks Stefan Behrens).

    We also have an assortment of fixes and updates, especially to the
    balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and
    the defragging code (Liu Bo)."

    Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to
    removal of the second argument to k[un]map_atomic() in commit
    7ac687d9e047.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits)
    Btrfs: update the checks for mixed block groups with big metadata blocks
    Btrfs: update to the right index of defragment
    Btrfs: do not bother to defrag an extent if it is a big real extent
    Btrfs: add a check to decide if we should defrag the range
    Btrfs: fix recursive defragment with autodefrag option
    Btrfs: fix the mismatch of page->mapping
    Btrfs: fix race between direct io and autodefrag
    Btrfs: fix deadlock during allocating chunks
    Btrfs: show useful info in space reservation tracepoint
    Btrfs: don't use crc items bigger than 4KB
    Btrfs: flush out and clean up any block device pages during mount
    btrfs: disallow unequal data/metadata blocksize for mixed block groups
    Btrfs: enhance superblock sanity checks
    Btrfs: change scrub to support big blocks
    Btrfs: minor cleanup in scrub
    Btrfs: introduce common define for max number of mirrors
    Btrfs: fix infinite loop in btrfs_shrink_device()
    Btrfs: fix memory leak in resolver code
    Btrfs: allow dup for data chunks in mixed mode
    Btrfs: validate target profiles only if we are going to use them
    ...

    Linus Torvalds
     

29 Mar, 2012

1 commit


27 Mar, 2012

1 commit

  • btrfs_init_lockdep only makes our lockdep class names look prettier, thus
    it did never hurt we forgot to actually call it. This turns our lockdep
    identifier strings from lockdep auto-set #[id] into really pretty
    "btrfs-fs-01" or "btrfs-csum-03".

    Signed-off-by: Jan Schmidt

    Jan Schmidt
     

22 Mar, 2012

5 commits

  • btrfs currently handles most errors with BUG_ON. This patch is a work-in-
    progress but aims to handle most errors other than internal logic
    errors and ENOMEM more gracefully.

    This iteration prevents most crashes but can run into lockups with
    the page lock on occasion when the timing "works out."

    Signed-off-by: Jeff Mahoney

    Jeff Mahoney
     
  • Signed-off-by: Jeff Mahoney

    Jeff Mahoney
     
  • btrfs currently handles most errors with BUG_ON. This patch is a work-in-
    progress but aims to handle most errors other than internal logic
    errors and ENOMEM more gracefully.

    This iteration prevents most crashes but can run into lockups with
    the page lock on occasion when the timing "works out."

    Signed-off-by: Jeff Mahoney

    Jeff Mahoney
     
  • Signed-off-by: Jeff Mahoney

    Jeff Mahoney
     
  • As part of the effort to eliminate BUG_ON as an error handling
    technique, we need to determine which errors are actual logic errors,
    which are on-disk corruption, and which are normal runtime errors
    e.g. -ENOMEM.

    Annotating these error cases is helpful to understand and report them.

    This patch adds a btrfs_panic() routine that will either panic
    or BUG depending on the new -ofatal_errors={panic,bug} mount option.
    Since there are still so many BUG_ONs, it defaults to BUG for now but I
    expect that to change once the error handling effort has made
    significant progress.

    Signed-off-by: Jeff Mahoney

    Jeff Mahoney
     

21 Mar, 2012

1 commit


18 Jan, 2012

2 commits

  • * 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    btrfs: take allocation of ->tree_root into open_ctree()
    btrfs: let ->s_fs_info point to fs_info, not root...
    btrfs: consolidate failure exits in btrfs_mount() a bit
    btrfs: make free_fs_info() call ->kill_sb() unconditional
    btrfs: merge free_fs_info() calls on fill_super failures
    btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super()
    btrfs: make open_ctree() return int
    btrfs: sanitizing ->fs_info, part 5
    btrfs: sanitizing ->fs_info, part 4
    btrfs: sanitizing ->fs_info, part 3
    btrfs: sanitizing ->fs_info, part 2
    btrfs: sanitizing ->fs_info, part 1
    btrfs: fix a deadlock in btrfs_scan_one_device()
    btrfs: fix mount/umount race
    btrfs: get ->kill_sb() of its own
    btrfs: preparation to fixing mount/umount race

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
    Btrfs: use larger system chunks
    Btrfs: add a delalloc mutex to inodes for delalloc reservations
    Btrfs: space leak tracepoints
    Btrfs: protect orphan block rsv with spin_lock
    Btrfs: add allocator tracepoints
    Btrfs: don't call btrfs_throttle in file write
    Btrfs: release space on error in page_mkwrite
    Btrfs: fix btrfsck error 400 when truncating a compressed
    Btrfs: do not use btrfs_end_transaction_throttle everywhere
    Btrfs: add balance progress reporting
    Btrfs: allow for resuming restriper after it was paused
    Btrfs: allow for canceling restriper
    Btrfs: allow for pausing restriper
    Btrfs: add skip_balance mount option
    Btrfs: recover balance on mount
    Btrfs: save balance parameters to disk
    Btrfs: soft profile changing mode (aka soft convert)
    Btrfs: implement online profile changing
    Btrfs: do not reduce profile in do_chunk_alloc()
    Btrfs: virtual address space subset filter
    ...

    Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new
    mnt_drop_write_file() helper.

    Linus Torvalds
     

17 Jan, 2012

1 commit