27 Sep, 2016

3 commits

  • For many printks, we want to know which file system issued the message.

    This patch converts most pr_* calls to use the btrfs_* versions instead.
    In some cases, this means adding plumbing to allow call sites access to
    an fs_info pointer.

    fs/btrfs/check-integrity.c is left alone for another day.

    Signed-off-by: Jeff Mahoney
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Jeff Mahoney
     
  • This patch converts printk(KERN_* style messages to use the pr_* versions.

    One side effect is that anything that was KERN_DEBUG is now automatically
    a dynamic debug message.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     
  • CodingStyle chapter 2:
    "[...] never break user-visible strings such as printk messages,
    because that breaks the ability to grep for them."

    This patch unsplits user-visible strings.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     

26 Sep, 2016

1 commit

  • We have a lot of random ints in btrfs_fs_info that can be put into flags. This
    is mostly equivalent with the exception of how we deal with quota going on or
    off, now instead we set a flag when we are turning it on or off and deal with
    that appropriately, rather than just having a pending state that the current
    quota_enabled gets set to. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: David Sterba

    Josef Bacik
     

25 Aug, 2016

1 commit

  • When running fstests generic/068, sometimes we got below deadlock:
    xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080
    ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000
    ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001
    ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8
    Call Trace:
    [] schedule+0x35/0x80
    [] rwsem_down_read_failed+0xf2/0x140
    [] ? __filemap_fdatawrite_range+0xd1/0x100
    [] call_rwsem_down_read_failed+0x18/0x30
    [] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs]
    [] percpu_down_read+0x35/0x50
    [] __sb_start_write+0x2c/0x40
    [] start_transaction+0x2a5/0x4d0 [btrfs]
    [] btrfs_join_transaction+0x17/0x20 [btrfs]
    [] btrfs_evict_inode+0x3c4/0x5d0 [btrfs]
    [] evict+0xba/0x1a0
    [] iput+0x196/0x200
    [] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs]
    [] btrfs_commit_transaction+0x928/0xa80 [btrfs]
    [] btrfs_freeze+0x30/0x40 [btrfs]
    [] freeze_super+0xf0/0x190
    [] do_vfs_ioctl+0x4a5/0x5c0
    [] ? do_audit_syscall_entry+0x66/0x70
    [] ? syscall_trace_enter_phase1+0x11f/0x140
    [] SyS_ioctl+0x79/0x90
    [] do_syscall_64+0x62/0x110
    [] entry_SYSCALL64_slow_path+0x25/0x25

    >From this warning, freeze_super() already holds SB_FREEZE_FS, but
    btrfs_freeze() will call btrfs_commit_transaction() again, if
    btrfs_commit_transaction() finds that it has delayed iputs to handle,
    it'll start_transaction(), which will try to get SB_FREEZE_FS lock
    again, then deadlock occurs.

    The root cause is that in btrfs, sync_filesystem(sb) does not make
    sure all metadata is updated. There still maybe some codes adding
    delayed iputs, see below sample race window:

    CPU1 | CPU2
    |-> freeze_super() |
    |-> sync_filesystem(sb); |
    | |-> cleaner_kthread()
    | | |-> btrfs_delete_unused_bgs()
    | | |-> btrfs_remove_chunk()
    | | |-> btrfs_remove_block_group()
    | | |-> btrfs_add_delayed_iput()
    | |
    |-> sb->s_writers.frozen = SB_FREEZE_FS; |
    |-> sb_wait_write(sb, SB_FREEZE_FS); |
    | acquire SB_FREEZE_FS lock. |
    | |
    |-> btrfs_freeze() |
    |-> btrfs_commit_transaction() |
    |-> btrfs_run_delayed_iputs() |
    | will handle delayed iputs, |
    | that means start_transaction() |
    | will be called, which will try |
    | to get SB_FREEZE_FS lock. |

    To fix this issue, introduce a "int fs_frozen" to record internally whether
    fs has been frozen. If fs has been frozen, we can not handle delayed iputs.

    Signed-off-by: Wang Xiaoguang
    Reviewed-by: David Sterba
    [ add comment to btrfs_freeze ]
    Signed-off-by: David Sterba

    Signed-off-by: Chris Mason

    Wang Xiaoguang
     

26 Jul, 2016

6 commits

  • __btrfs_abort_transaction doesn't use its root parameter except to
    obtain an fs_info pointer. We can obtain that from trans->root->fs_info
    for now and from trans->fs_info in a later patch.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     
  • We have all these stubs that only exist because they're called from
    btrfs_run_sanity_tests, which is a static inside super.c. Let's just
    move it all into tests/btrfs-tests.c and only have one stub.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     
  • btrfs_test_opt and friends only use the root pointer to access
    the fs_info. Let's pass the fs_info directly in preparation to
    eliminate similar patterns all over btrfs.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     
  • When using trace events to debug a problem, it's impossible to determine
    which file system generated a particular event. This patch adds a
    macro to prefix standard information to the head of a trace event.

    The extent_state alloc/free events are all that's left without an
    fs_info available.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: David Sterba

    Jeff Mahoney
     
  • The mixed blockgroup reporting has been fixed by commit
    ae02d1bd070767e109f4a6f1bb1f466e9698a355
    "btrfs: fix mixed block count of available space"

    Signed-off-by: David Sterba

    David Sterba
     
  • This patch adds ratelimiting to all messages which are not using the _rl
    version of the various printing APIs in btrfs. This is designed to be
    used as a safety net, since a flood messages might cause the softlockup
    detector to trigger. To reduce interference between different classes of
    messages use a separate ratelimit state for every class of message.

    Signed-off-by: Nikolay Borisov
    Signed-off-by: David Sterba

    Nikolay Borisov
     

18 Jun, 2016

2 commits

  • This fixes a problem introduced in commit 2f3165ecf103599f82bf0ea254039db335fb5005
    "btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes".

    open_ctree eventually calls btrfs_replay_log which in turn calls
    btrfs_commit_super which tries to lock the cleaner_mutex, causing a
    recursive mutex deadlock during mount.

    Instead of playing whack-a-mole trying to keep up with all the
    functions that may want to lock cleaner_mutex, put all the cleaner_mutex
    lockers back where they were, and attack the problem more directly:
    keep cleaner_kthread asleep until the filesystem is mounted.

    When filesystems are mounted read-only and later remounted read-write,
    open_ctree did not set fs_info->open and neither does anything else.
    Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs
    nor cleaner_kthread get confused by the common case of "/" filesystem
    read-only mount followed by read-write remount.

    Signed-off-by: Zygo Blaxell
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Zygo Blaxell
     
  • The test for !trans->blocks_used in btrfs_abort_transaction is
    insufficient to determine whether it's safe to drop the transaction
    handle on the floor. btrfs_cow_block, informed by should_cow_block,
    can return blocks that have already been CoW'd in the current
    transaction. trans->blocks_used is only incremented for new block
    allocations. If an operation overlaps the blocks in the current
    transaction entirely and must abort the transaction, we'll happily
    let it clean up the trans handle even though it may have modified
    the blocks and will commit an incomplete operation.

    In the long-term, I'd like to do closer tracking of when the fs
    is actually modified so we can still recover as gracefully as possible,
    but that approach will need some discussion. In the short term,
    since this is the only code using trans->blocks_used, let's just
    switch it to a bool indicating whether any blocks were used and set
    it when should_cow_block returns false.

    Cc: stable@vger.kernel.org # 3.4+
    Signed-off-by: Jeff Mahoney
    Reviewed-by: Filipe Manana
    Signed-off-by: David Sterba

    Jeff Mahoney
     

09 Jun, 2016

1 commit


06 Jun, 2016

2 commits


03 Jun, 2016

1 commit

  • self-tests code assumes 4k as the sectorsize and nodesize. This commit
    fix hardcoded 4K. Enables the self-tests code to be executed on non-4k
    page sized systems (e.g. ppc64).

    Reviewed-by: Josef Bacik
    Signed-off-by: Feifei Xu
    Signed-off-by: Chandan Rajendra
    Signed-off-by: David Sterba

    Feifei Xu
     

26 May, 2016

2 commits


18 May, 2016

1 commit


16 May, 2016

1 commit


13 May, 2016

1 commit

  • Before the relocation process of a block group starts, it sets the block
    group to readonly mode, then flushes all delalloc writes and then finally
    it waits for all ordered extents to complete. This last step includes
    waiting for ordered extents destinated at extents allocated in other block
    groups, making us waste unecessary time.

    So improve this by waiting only for ordered extents that fall into the
    block group's range.

    Signed-off-by: Filipe Manana
    Reviewed-by: Josef Bacik
    Reviewed-by: Liu Bo

    Filipe Manana
     

06 May, 2016

2 commits


28 Apr, 2016

3 commits

  • Correct a typo in the chunk_mutex name to make it grepable.

    Since it is better to fix several typos at once, fixing the 2 more in the
    same file.

    Signed-off-by: Luis de Bethencourt
    Signed-off-by: David Sterba

    Luis de Bethencourt
     
  • Actually save_error_info() sets the FS state to error and nothing else.
    Further the word save doesn't induce caffeine when compared to the word
    set in what actually it does.

    So to make it better understandable move save_error_info() code to its
    only consumer itself.

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Anand Jain
     
  • btrfs_std_error() handles errors, puts FS into readonly mode
    (as of now). So its good idea to rename it to btrfs_handle_fs_error().

    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    [ edit changelog ]
    Signed-off-by: David Sterba

    Anand Jain
     

12 Mar, 2016

1 commit


26 Feb, 2016

1 commit


23 Feb, 2016

2 commits


12 Feb, 2016

3 commits

  • Introduce new mount option alias "norecovery" for nologreplay, to keep
    "norecovery" behavior the same with other filesystems.

    Signed-off-by: Qu Wenruo
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • Introduce a new mount option "nologreplay" to co-operate with "ro" mount
    option to get real readonly mount, like "norecovery" in ext* and xfs.

    Since the new parse_options() need to check new flags at remount time,
    so add a new parameter for parse_options().

    Signed-off-by: Qu Wenruo
    Reviewed-by: Chandan Rajendra
    Tested-by: Austin S. Hemmelgarn
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • Current "recovery" mount option will only try to use backup root.
    However the word "recovery" is too generic and may be confusing for some
    users.

    Here introduce a new and more specific mount option, "usebackuproot" to
    replace "recovery" mount option.
    "Recovery" will be kept for compatibility reason, but will be
    deprecated.

    Also, since "usebackuproot" will only affect mount behavior and after
    open_ctree() it has nothing to do with the filesystem, so clear the flag
    after mount succeeded.

    This provides the basis for later unified "norecovery" mount option.

    Signed-off-by: Qu Wenruo
    [ dropped usebackuproot from show_mount, added note about 'recovery' to
    docs ]
    Signed-off-by: David Sterba

    Qu Wenruo
     

23 Jan, 2016

1 commit

  • Pull more btrfs updates from Chris Mason:
    "These are mostly fixes that we've been testing, but also we grabbed
    and tested a few small cleanups that had been on the list for a while.

    Zhao Lei's patchset also fixes some early ENOSPC buglets"

    * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (21 commits)
    btrfs: raid56: Use raid_write_end_io for scrub
    btrfs: Remove unnecessary ClearPageUptodate for raid56
    btrfs: use rbio->nr_pages to reduce calculation
    btrfs: Use unified stripe_page's index calculation
    btrfs: Fix calculation of rbio->dbitmap's size calculation
    btrfs: Fix no_space in write and rm loop
    btrfs: merge functions for wait snapshot creation
    btrfs: delete unused argument in btrfs_copy_from_user
    btrfs: Use direct way to determine raid56 write/recover mode
    btrfs: Small cleanup for get index_srcdev loop
    btrfs: Enhance chunk validation check
    btrfs: Enhance super validation check
    Btrfs: fix deadlock running delayed iputs at transaction commit time
    Btrfs: fix typo in log message when starting a balance
    btrfs: remove duplicate const specifier
    btrfs: initialize the seq counter in struct btrfs_device
    Btrfs: clean up an error code in btrfs_init_space_info()
    btrfs: fix iterator with update error in backref.c
    Btrfs: fix output of compression message in btrfs_parse_options()
    Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots
    ...

    Linus Torvalds
     

20 Jan, 2016

1 commit


19 Jan, 2016

1 commit

  • Pull btrfs updates from Chris Mason:
    "This has our usual assortment of fixes and cleanups, but the biggest
    change included is Omar Sandoval's free space tree. It's not the
    default yet, mounting -o space_cache=v2 enables it and sets a readonly
    compat bit. The tree can actually be deleted and regenerated if there
    are any problems, but it has held up really well in testing so far.

    For very large filesystems (30T+) our existing free space caching code
    can end up taking a huge amount of time during commits. The new tree
    based code is faster and less work overall to update as the commit
    progresses.

    Omar worked on this during the summer and we'll hammer on it in
    production here at FB over the next few months"

    * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (73 commits)
    Btrfs: fix fitrim discarding device area reserved for boot loader's use
    Btrfs: Check metadata redundancy on balance
    btrfs: statfs: report zero available if metadata are exhausted
    btrfs: preallocate path for snapshot creation at ioctl time
    btrfs: allocate root item at snapshot ioctl time
    btrfs: do an allocation earlier during snapshot creation
    btrfs: use smaller type for btrfs_path locks
    btrfs: use smaller type for btrfs_path lowest_level
    btrfs: use smaller type for btrfs_path reada
    btrfs: cleanup, use enum values for btrfs_path reada
    btrfs: constify static arrays
    btrfs: constify remaining structs with function pointers
    btrfs tests: replace whole ops structure for free space tests
    btrfs: use list_for_each_entry* in backref.c
    btrfs: use list_for_each_entry_safe in free-space-cache.c
    btrfs: use list_for_each_entry* in check-integrity.c
    Btrfs: use linux/sizes.h to represent constants
    btrfs: cleanup, remove stray return statements
    btrfs: zero out delayed node upon allocation
    btrfs: pass proper enum type to start_transaction()
    ...

    Linus Torvalds
     

16 Jan, 2016

1 commit

  • The compression message might not be correctly output.
    Fix it.

    [[before fix]]

    # mount -o compress /dev/sdb3 /test3
    [ 996.874264] BTRFS info (device sdb3): disk space caching is enabled
    [ 996.874268] BTRFS: has skinny extents
    # mount | grep /test3
    /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

    # mount -o remount,compress-force /dev/sdb3 /test3
    [ 1035.075017] BTRFS info (device sdb3): force zlib compression
    [ 1035.075021] BTRFS info (device sdb3): disk space caching is enabled
    # mount | grep /test3
    /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)

    # mount -o remount,compress /dev/sdb3 /test3
    [ 1053.679092] BTRFS info (device sdb3): disk space caching is enabled
    # mount | grep /test3
    /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

    [[after fix]]

    # mount -o compress /dev/sdb3 /test3
    [ 401.021753] BTRFS info (device sdb3): use zlib compression
    [ 401.021758] BTRFS info (device sdb3): disk space caching is enabled
    [ 401.021760] BTRFS: has skinny extents
    # mount | grep /test3
    /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

    # mount -o remount,compress-force /dev/sdb3 /test3
    [ 439.824624] BTRFS info (device sdb3): force zlib compression
    [ 439.824629] BTRFS info (device sdb3): disk space caching is enabled
    # mount | grep /test3
    /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)

    # mount -o remount,compress /dev/sdb3 /test3
    [ 459.918430] BTRFS info (device sdb3): use zlib compression
    [ 459.918434] BTRFS info (device sdb3): disk space caching is enabled
    # mount | grep /test3
    /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: David Sterba

    Tsutomu Itoh
     

11 Jan, 2016

2 commits