06 Dec, 2018

2 commits

  • commit fce466eab7ac6baa9d2dcd88abcf945be3d4a089 upstream.

    A crafted image with invalid block group items could make free space cache
    code to cause panic.

    We could detect such invalid block group item by checking:
    1) Item size
    Known fixed value.
    2) Block group size (key.offset)
    We have an upper limit on block group item (10G)
    3) Chunk objectid
    Known fixed value.
    4) Type
    Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA.
    No more than 1 bit set for profile type.
    5) Used space
    No more than the block group size.

    This should allow btrfs to detect and refuse to mount the crafted image.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849
    Reported-by: Xu Wen
    Signed-off-by: Qu Wenruo
    Reviewed-by: Gu Jinxiang
    Reviewed-by: Nikolay Borisov
    Tested-by: Gu Jinxiang
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    [bwh: Backported to 4.14:
    - In check_leaf_item(), pass root->fs_info to check_block_group_item()
    - Adjust context]
    Signed-off-by: Ben Hutchings
    Signed-off-by: Sasha Levin

    Qu Wenruo
     
  • commit 315409b0098fb2651d86553f0436b70502b29bb2 upstream.

    Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199839, with an
    image that has an invalid chunk type but does not return an error.

    Add chunk type check in btrfs_check_chunk_valid, to detect the wrong
    type combinations.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=199839
    Reported-by: Xu Wen
    Reviewed-by: Qu Wenruo
    Signed-off-by: Gu Jinxiang
    Signed-off-by: David Sterba
    Signed-off-by: Ben Hutchings
    Signed-off-by: Sasha Levin

    Gu Jinxiang
     

10 Oct, 2018

1 commit

  • [ Upstream commit 801660b040d132f67fac6a95910ad307c5929b49 ]

    Test case btrfs/164 reports use-after-free:

    [ 6712.084324] general protection fault: 0000 [#1] PREEMPT SMP
    ..
    [ 6712.195423] btrfs_update_commit_device_size+0x75/0xf0 [btrfs]
    [ 6712.201424] btrfs_commit_transaction+0x57d/0xa90 [btrfs]
    [ 6712.206999] btrfs_rm_device+0x627/0x850 [btrfs]
    [ 6712.211800] btrfs_ioctl+0x2b03/0x3120 [btrfs]

    Reason for this is that btrfs_shrink_device adds the resized device to
    the fs_devices::resized_devices after it has called the last commit
    transaction.

    So the list fs_devices::resized_devices is not empty when
    btrfs_shrink_device returns. Now the parent function
    btrfs_rm_device calls:

    btrfs_close_bdev(device);
    call_rcu(&device->rcu, free_device_rcu);

    and then does the transactio ncommit. It goes through the
    fs_devices::resized_devices in btrfs_update_commit_device_size and
    leads to use-after-free.

    Fix this by making sure btrfs_shrink_device calls the last needed
    btrfs_commit_transaction before the return. This is consistent with what
    the grow counterpart does and this makes sure the on-disk state is
    persistent when the function returns.

    Reported-by: Lu Fengqi
    Tested-by: Lu Fengqi
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    [ update changelog ]
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     

15 Sep, 2018

1 commit

  • [ Upstream commit 64f64f43c89aca1782aa672e0586f6903c5d8979 ]

    It's entirely possible that a crafted btrfs image contains overlapping
    chunks.

    Although we can't detect such problem by tree-checker, it's not a
    catastrophic problem, current extent map can already detect such problem
    and return -EEXIST.

    We just only need to exit gracefully and fail the mount.

    Reported-by: Xu Wen
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=200409
    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Qu Wenruo
     

21 Jun, 2018

2 commits

  • [ Upstream commit 8810f7517a3bc4ca2d41d022446d3f5fd6b77c09 ]

    There is a scenario that can end up with rebuild process failing to
    return good content, i.e.
    suppose that all disks can be read without problems and if the content
    that was read out doesn't match its checksum, currently for raid6
    btrfs at most retries twice,

    - the 1st retry is to rebuild with all other stripes, it'll eventually
    be a raid5 xor rebuild,
    - if the 1st fails, the 2nd retry will deliberately fail parity p so
    that it will do raid6 style rebuild,

    however, the chances are that another non-parity stripe content also
    has something corrupted, so that the above retries are not able to
    return correct content, and users will think of this as data loss.
    More seriouly, if the loss happens on some important internal btree
    roots, it could refuse to mount.

    This extends btrfs to do more retries and each retry fails only one
    stripe. Since raid6 can tolerate 2 disk failures, if there is one
    more failure besides the failure on which we're recovering, this can
    always work.

    The worst case is to retry as many times as the number of raid6 disks,
    but given the fact that such a scenario is really rare in practice,
    it's still acceptable.

    Signed-off-by: Liu Bo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     
  • This reverts commit d91bb7c6988bd6450284c762b33f2e1ea3fe7c97.

    This commit used an incorrect log message.

    Signed-off-by: Sasha Levin
    Reported-by: Ben Hutchings
    Signed-off-by: Greg Kroah-Hartman

    Sasha Levin
     

23 May, 2018

1 commit

  • commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream.

    We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
    only, which isn't called during the remount. So when resuming from
    the paused balance we hit the bug:

    kernel: kernel BUG at fs/btrfs/volumes.c:3890!
    ::
    kernel: balance_kthread+0x51/0x60 [btrfs]
    kernel: kthread+0x111/0x130
    ::
    kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8

    Reproducer:
    On a mounted filesystem:

    btrfs balance start --full-balance /btrfs
    btrfs balance pause /btrfs
    mount -o remount,ro /dev/sdb /btrfs
    mount -o remount,rw /dev/sdb /btrfs

    To fix this set the BTRFS_BALANCE_RESUME flag in
    btrfs_resume_balance_async().

    CC: stable@vger.kernel.org # 4.4+
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     

26 Apr, 2018

1 commit

  • [ Upstream commit 762221f095e3932669093466aaf4b85ed9ad2ac1 ]

    The raid6 corruption is that,
    suppose that all disks can be read without problems and if the content
    that was read out doesn't match its checksum, currently for raid6
    btrfs at most retries twice,

    - the 1st retry is to rebuild with all other stripes, it'll eventually
    be a raid5 xor rebuild,
    - if the 1st fails, the 2nd retry will deliberately fail parity p so
    that it will do raid6 style rebuild,

    however, the chances are that another non-parity stripe content also
    has something corrupted, so that the above retries are not able to
    return correct content.

    We've fixed normal reads to rebuild raid6 correctly with more retries
    in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
    scrub to do the exactly same rebuild process.

    [1]: https://patchwork.kernel.org/patch/10091755/

    Signed-off-by: Liu Bo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Liu Bo
     

21 Mar, 2018

3 commits

  • commit 9deae9689231964972a94bb56a79b669f9d47ac1 upstream.

    Commit addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev
    stats is cleared") reworked the way device stats changes are tracked. A
    new atomic dev_stats_ccnt counter was introduced which is incremented
    every time any of the device stats counters are changed. This serves as
    a flag whether there are any pending stats changes. However, this patch
    only partially implemented the correct memory barriers necessary:

    - It only ordered the stores to the counters but not the reads e.g.
    btrfs_run_dev_stats
    - It completely omitted any comments documenting the intended design and
    how the memory barriers pair with each-other

    This patch provides the necessary comments as well as adds a missing
    smp_rmb in btrfs_run_dev_stats. Furthermore since dev_stats_cnt is only
    a snapshot at best there was no point in reading the counter twice -
    once in btrfs_dev_stats_dirty and then again when assigning stats_cnt.
    Just collapse both reads into 1.

    Fixes: addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared")
    Signed-off-by: Nikolay Borisov
    Reviewed-by: Mathieu Desnoyers
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov
     
  • commit fd649f10c3d21ee9d7542c609f29978bdf73ab94 upstream.

    Commit 4fde46f0cc71 ("Btrfs: free the stale device") introduced
    btrfs_free_stale_device which iterates the device lists for all
    registered btrfs filesystems and deletes those devices which aren't
    mounted. In a btrfs_devices structure has only 1 device attached to it
    and it is unused then btrfs_free_stale_devices will proceed to also free
    the btrfs_fs_devices struct itself. Currently this leads to a use after
    free since list_for_each_entry will try to perform a check on the
    already freed memory to see if it has to terminate the loop.

    The fix is to use 'break' when we know we are freeing the current
    fs_devs.

    Fixes: 4fde46f0cc71 ("Btrfs: free the stale device")
    Signed-off-by: Nikolay Borisov
    Reviewed-by: Anand Jain
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov
     
  • commit 92e222df7b8f05c565009c7383321b593eca488b upstream.

    In case of using DUP, we search for enough unallocated disk space on a
    device to hold two stripes.

    The devices_info[ndevs-1].max_avail that holds the amount of unallocated
    space found is directly assigned to stripe_size, while it's actually
    twice the stripe size.

    Later on in the code, an unconditional division of stripe_size by
    dev_stripes corrects the value, but in the meantime there's a check to
    see if the stripe_size does not exceed max_chunk_size. Since during this
    check stripe_size is twice the amount as intended, the check will reduce
    the stripe_size to max_chunk_size if the actual correct to be used
    stripe_size is more than half the amount of max_chunk_size.

    The unconditional division later tries to correct stripe_size, but will
    actually make sure we can't allocate more than half the max_chunk_size.

    Fix this by moving the division by dev_stripes before the max chunk size
    check, so it always contains the right value, instead of putting a duct
    tape division in further on to get it fixed again.

    Since in all other cases than DUP, dev_stripes is 1, this change only
    affects DUP.

    Other attempts in the past were made to fix this:
    * 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
    to fix the same problem, but still resulted in part of the code acting
    on a wrongly doubled stripe_size value.
    * 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
    broke this fix again.

    The real problem was already introduced with the rest of the code in
    73c5de0051.

    The user visible result however will be that the max chunk size for DUP
    will suddenly double, while it's actually acting according to the limits
    in the code again like it was 5 years ago.

    Reported-by: Naohiro Aota
    Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
    Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
    Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
    Signed-off-by: Hans van Kranenburg
    Reviewed-by: David Sterba
    [ update comment ]
    Signed-off-by: David Sterba
    Signed-off-by: Greg Kroah-Hartman

    Hans van Kranenburg
     

03 Mar, 2018

1 commit

  • [ Upstream commit beed9263f4000c48a5c48912f26576f6fa091181 ]

    Commit e0ae99941423 ("btrfs: preallocate device flush bio") reworked
    the way the flush bio is allocated and used. Concretely it allocates
    the bio in __alloc_device and then re-uses it multiple times with a
    very simple endio routine that just calls complete() without consuming
    a reference. Allocated bios by default come with a ref count of 1,
    which is then consumed by the endio routine (or not, in which case they
    should be bio_put by the caller). The way the impleementation works now
    is that the flush bio has a refcount of 2 and we only ever bio_put it
    once, leaving it to hang indefinitely. Fix this by removing the extra
    bio_get in __alloc_device.

    Fixes: e0ae99941423 ("btrfs: preallocate device flush bio")
    Signed-off-by: Nikolay Borisov
    Reviewed-by: Liu Bo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov
     

04 Feb, 2018

1 commit

  • [ Upstream commit 5e9f2ad5b2904a7e81df6d9a3dbef29478952eac ]

    btrfs_rm_dev_item calls several function under an active transaction,
    however it fails to abort it if an error happens. Fix this by adding
    explicit btrfs_abort_transaction/btrfs_end_transaction calls.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Borisov
     

20 Dec, 2017

2 commits

  • [ Upstream commit 0af2c4bf5a012a40a2f9230458087d7f068339d0 ]

    When new device is being added to seed FS, seed FS is marked writable,
    but when we fail to bring in the new device, we missed to undo the
    writable part. This patch fixes it.

    Signed-off-by: Anand Jain
    Reviewed-by: Nikolay Borisov
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     
  • [ Upstream commit 102ed2c5ff932439bbbe74c7bd63e6d5baa9f732 ]

    When one of the device is missing, bbio_error() takes care of setting
    the error status. And if its only IO that is pending in that stripe, it
    fails to check the status of the other IO at %bbio_error before setting
    the error %bi_status for the %orig_bio. Fix this by checking if
    %bbio->error has exceeded the %bbio->max_errors.

    Reproducer as below fdatasync error is seen intermittently.

    mount -o degraded /dev/sdc /btrfs
    dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync

    dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error

    The reason for the intermittences of the problem is because
    the following conditions have to be met, which depends on timing:
    In btrfs_map_bio()
    - the RAID1 the missing device has to be at %dev_nr = 1
    In bbio_error()
    . before bbio_error() is called the bio of the not-missing
    device at %dev_nr = 0 must be completed so that the below
    condition is true
    if (atomic_dec_and_test(&bbio->stripes_pending)) {

    Signed-off-by: Anand Jain
    Reviewed-by: Liu Bo
    Signed-off-by: David Sterba
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Anand Jain
     

30 Sep, 2017

1 commit

  • Pull btrfs fixes from David Sterba:
    "We've collected a bunch of isolated fixes, for crashes, user-visible
    behaviour or missing bits from other subsystem cleanups from the past.

    The overall number is not small but I was not able to make it
    significantly smaller. Most of the patches are supposed to go to
    stable"

    * 'for-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
    btrfs: log csums for all modified extents
    Btrfs: fix unexpected result when dio reading corrupted blocks
    btrfs: Report error on removing qgroup if del_qgroup_item fails
    Btrfs: skip checksum when reading compressed data if some IO have failed
    Btrfs: fix kernel oops while reading compressed data
    Btrfs: use btrfs_op instead of bio_op in __btrfs_map_block
    Btrfs: do not backup tree roots when fsync
    btrfs: remove BTRFS_FS_QUOTA_DISABLING flag
    btrfs: propagate error to btrfs_cmp_data_prepare caller
    btrfs: prevent to set invalid default subvolid
    Btrfs: send: fix error number for unknown inode types
    btrfs: fix NULL pointer dereference from free_reloc_roots()
    btrfs: finish ordered extent cleaning if no progress is found
    btrfs: clear ordered flag on cleaning up ordered extents
    Btrfs: fix incorrect {node,sector}size endianness from BTRFS_IOC_FS_INFO
    Btrfs: do not reset bio->bi_ops while writing bio
    Btrfs: use the new helper wbc_to_write_flags

    Linus Torvalds
     

26 Sep, 2017

1 commit

  • This seems to be a leftover of commit cf8cddd38bab ("btrfs: don't
    abuse REQ_OP_* flags for btrfs_map_block").

    It should use btrfs_op() helper to provide one of 'enum btrfs_map_op'
    types.

    Fixes: cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
    Signed-off-by: Liu Bo
    Reviewed-by: Satoru Takeuchi
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Liu Bo
     

15 Sep, 2017

1 commit

  • Pull mount flag updates from Al Viro:
    "Another chunk of fmount preparations from dhowells; only trivial
    conflicts for that part. It separates MS_... bits (very grotty
    mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
    only a small subset of MS_... stuff).

    This does *not* convert the filesystems to new constants; only the
    infrastructure is done here. The next step in that series is where the
    conflicts would be; that's the conversion of filesystems. It's purely
    mechanical and it's better done after the merge, so if you could run
    something like

    list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

    sed -i -e 's/\/SB_RDONLY/g' \
    -e 's/\/SB_NOSUID/g' \
    -e 's/\/SB_NODEV/g' \
    -e 's/\/SB_NOEXEC/g' \
    -e 's/\/SB_SYNCHRONOUS/g' \
    -e 's/\/SB_MANDLOCK/g' \
    -e 's/\/SB_DIRSYNC/g' \
    -e 's/\/SB_NOATIME/g' \
    -e 's/\/SB_NODIRATIME/g' \
    -e 's/\/SB_SILENT/g' \
    -e 's/\/SB_POSIXACL/g' \
    -e 's/\/SB_KERNMOUNT/g' \
    -e 's/\/SB_I_VERSION/g' \
    -e 's/\/SB_LAZYTIME/g' \
    $list

    and commit it with something along the lines of 'convert filesystems
    away from use of MS_... constants' as commit message, it would save a
    quite a bit of headache next cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VFS: Differentiate mount flags (MS_*) from internal superblock flags
    VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
    vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

    Linus Torvalds
     

10 Sep, 2017

1 commit

  • Pull btrfs updates from David Sterba:
    "The changes range through all types: cleanups, core chagnes, sanity
    checks, fixes, other user visible changes, detailed list below:

    - deprecated: user transaction ioctl

    - mount option ssd does not change allocation alignments

    - degraded read-write mount is allowed if all the raid profile
    constraints are met, now based on more accurate check

    - defrag: do not reset compression afterwards; the NOCOMPRESS flag
    can be now overriden by defrag

    - prep work for better extent reference tracking (related to the
    qgroup slowness with balance)

    - prep work for compression heuristics

    - memory allocation reductions (may help latencies on a loaded
    system)

    - better accounting for io waiting states

    - error handling improvements (removed BUGs)

    - added more sanity checks for shared refs

    - fix readdir vs pagefault deadlock under some circumstances

    - fix for 'no-hole' mode, certain combination of compressed and
    inline extents

    - send: fix emission of invalid clone operations

    - fixup file mode if setting acls fail

    - more fixes from fuzzing

    - oher cleanups"

    * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (104 commits)
    btrfs: submit superblock io with REQ_META and REQ_PRIO
    btrfs: remove unnecessary memory barrier in btrfs_direct_IO
    btrfs: remove superfluous chunk_tree argument from btrfs_alloc_dev_extent
    btrfs: Remove chunk_objectid parameter of btrfs_alloc_dev_extent
    btrfs: pass fs_info to btrfs_del_root instead of tree_root
    Btrfs: add one more sanity check for shared ref type
    Btrfs: remove BUG_ON in __add_tree_block
    Btrfs: remove BUG() in add_data_reference
    Btrfs: remove BUG() in print_extent_item
    Btrfs: remove BUG() in btrfs_extent_inline_ref_size
    Btrfs: convert to use btrfs_get_extent_inline_ref_type
    Btrfs: add a helper to retrive extent inline ref type
    btrfs: scrub: simplify scrub worker initialization
    btrfs: scrub: clean up division in scrub_find_csum
    btrfs: scrub: clean up division in __scrub_mark_bitmap
    btrfs: scrub: use bool for flush_all_writes
    btrfs: preserve i_mode if __btrfs_set_acl() fails
    btrfs: Remove extraneous chunk_objectid variable
    btrfs: Remove chunk_objectid argument from btrfs_make_block_group
    btrfs: Remove extra parentheses from condition in copy_items()
    ...

    Linus Torvalds
     

08 Sep, 2017

1 commit

  • Pull block layer updates from Jens Axboe:
    "This is the first pull request for 4.14, containing most of the code
    changes. It's a quiet series this round, which I think we needed after
    the churn of the last few series. This contains:

    - Fix for a registration race in loop, from Anton Volkov.

    - Overflow complaint fix from Arnd for DAC960.

    - Series of drbd changes from the usual suspects.

    - Conversion of the stec/skd driver to blk-mq. From Bart.

    - A few BFQ improvements/fixes from Paolo.

    - CFQ improvement from Ritesh, allowing idling for group idle.

    - A few fixes found by Dan's smatch, courtesy of Dan.

    - A warning fixup for a race between changing the IO scheduler and
    device remova. From David Jeffery.

    - A few nbd fixes from Josef.

    - Support for cgroup info in blktrace, from Shaohua.

    - Also from Shaohua, new features in the null_blk driver to allow it
    to actually hold data, among other things.

    - Various corner cases and error handling fixes from Weiping Zhang.

    - Improvements to the IO stats tracking for blk-mq from me. Can
    drastically improve performance for fast devices and/or big
    machines.

    - Series from Christoph removing bi_bdev as being needed for IO
    submission, in preparation for nvme multipathing code.

    - Series from Bart, including various cleanups and fixes for switch
    fall through case complaints"

    * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
    kernfs: checking for IS_ERR() instead of NULL
    drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
    drbd: Fix allyesconfig build, fix recent commit
    drbd: switch from kmalloc() to kmalloc_array()
    drbd: abort drbd_start_resync if there is no connection
    drbd: move global variables to drbd namespace and make some static
    drbd: rename "usermode_helper" to "drbd_usermode_helper"
    drbd: fix race between handshake and admin disconnect/down
    drbd: fix potential deadlock when trying to detach during handshake
    drbd: A single dot should be put into a sequence.
    drbd: fix rmmod cleanup, remove _all_ debugfs entries
    drbd: Use setup_timer() instead of init_timer() to simplify the code.
    drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
    drbd: new disk-option disable-write-same
    drbd: Fix resource role for newly created resources in events2
    drbd: mark symbols static where possible
    drbd: Send P_NEG_ACK upon write error in protocol != C
    drbd: add explicit plugging when submitting batches
    drbd: change list_for_each_safe to while(list_first_entry_or_null)
    drbd: introduce drbd_recv_header_maybe_unplug
    ...

    Linus Torvalds
     

24 Aug, 2017

2 commits

  • This fixes several instances of blk_status_t and bare errno ints being
    mixed up, some of which are real bugs.

    In the normal case, 0 matches BLK_STS_OK, so we don't observe any
    effects of the missing conversion, but in case of errors or passes
    through the repair/retry paths, the errors get mixed up.

    The changes were identified using 'sparse', we don't have reports of the
    buggy behaviour.

    Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t")
    Signed-off-by: Omar Sandoval
    Reviewed-by: Liu Bo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Omar Sandoval
     
  • This way we don't need a block_device structure to submit I/O. The
    block_device has different life time rules from the gendisk and
    request_queue and is usually only available when the block device node
    is open. Other callers need to explicitly create one (e.g. the lightnvm
    passthrough code, or the new nvme multipathing code).

    For the actual I/O path all that we need is the gendisk, which exists
    once per block device. But given that the block layer also does
    partition remapping we additionally need a partition index, which is
    used for said remapping in generic_make_request.

    Note that all the block drivers generally want request_queue or
    sometimes the gendisk, so this removes a layer of indirection all
    over the stack.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

22 Aug, 2017

2 commits


21 Aug, 2017

3 commits


18 Aug, 2017

2 commits


16 Aug, 2017

11 commits

  • Superblock is read and written using buffer heads, we need to set the
    bdev blocksize. The magic constant has been hardcoded in several places,
    so replace it with a named constant.

    Signed-off-by: David Sterba

    David Sterba
     
  • Polish the helper:
    * drop underscores, no special meaning here
    * pass fs_devices, as this is what the API implements
    * drop noinline, no apparent reason for such simple helper
    * constify uuid
    * add comment

    Signed-off-by: David Sterba

    David Sterba
     
  • There are two helpers called in chain from one location, we can merge the
    functionaliy.

    Originally, alloc_fs_devices could fill the device uuid randomly if we
    we didn't give the uuid buffer. This happens for seed devices but the
    fsid is generated in btrfs_prepare_sprout, so we can remove it.

    Signed-off-by: David Sterba

    David Sterba
     
  • This also adjusts the respective callers in other files. Those were
    found with -Wunused-parameter.

    btrfs_full_stripe_len's mapping_tree - introduced by 53b381b3abeb
    ("Btrfs: RAID5 and RAID6") but it was never really used even in that
    commit

    btrfs_is_parity_mirror's mirror_num - same as above

    chunk_drange_filter's chunk_offset - introduced by 94e60d5a5c4b ("Btrfs:
    devid subset filter") and never used.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • clear_super - usage was removed in commit cea67ab92d3d ("btrfs: clean
    the old superblocks before freeing the device") but that change forgot
    to remove the actual variable.

    max_key - commit 6174d3cb43aa ("Btrfs: remove unused max_key arg from
    btrfs_search_forward") removed the max_key parameter but it forgot to
    remove references from callers.

    stripe_len - this one was added by e06cd3dd7cea ("Btrfs: add validadtion
    checks for chunk loading") but even then it wasn't used.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • find_raid56_stripe_len statically returns SZ_64K which equals BTRFS_STRIPE_LEN.
    It's sole caller is __btrfs_alloc_chunk and it assigns the return value to ai
    variable which is already set to BTRFS_STRIPE_LEN. So remove the function
    invocation altogether and remove the function itself. Also remove the variable
    since it's only aliasing BTRFS_STRIPE_LEN and use the define directly. Use
    the occassion to simplify the rounding down of stripe_size now that the value
    we want it to align is a power of 2.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • For a missing device, btrfs will just refuse to mount with almost
    meaningless kernel message like:

    BTRFS info (device vdb6): disk space caching is enabled
    BTRFS info (device vdb6): has skinny extents
    BTRFS error (device vdb6): failed to read the system array: -5
    BTRFS error (device vdb6): open_ctree failed

    This patch will print a new message about the missing device:

    BTRFS info (device vdb6): disk space caching is enabled
    BTRFS info (device vdb6): has skinny extents
    BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
    BTRFS error (device vdb6): failed to read the system array: -5
    BTRFS error (device vdb6): open_ctree failed

    Signed-off-by: Qu Wenruo
    Reviewed-by: Anand Jain
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • As we use per-chunk degradable check, the global
    num_tolerated_disk_barrier_failures is of no use.

    We can now remove it.

    Signed-off-by: Qu Wenruo
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • Introduce a new function, btrfs_check_rw_degradable(), to check if all
    chunks in btrfs is OK for degraded rw mount.

    It provides the new basis for accurate btrfs mount/remount and even
    runtime degraded mount check other than old one-size-fit-all method.

    Btrfs currently uses num_tolerated_disk_barrier_failures to do global
    check for tolerated missing device.

    Although the one-size-fit-all solution is quite safe, it's too strict
    if data and metadata has different duplication level.

    For example, if one use Single data and RAID1 metadata for 2 disks, it
    means any missing device will make the fs unable to be degraded
    mounted.

    But in fact, some times all single chunks may be in the existing
    device and in that case, we should allow it to be rw degraded mounted.

    Such case can be easily reproduced using the following script:
    # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
    # wipefs -f /dev/sdc
    # mount /dev/sdb -o degraded,rw

    If using btrfs-debug-tree to check /dev/sdb, one should find that the
    data chunk is only in sdb, so in fact it should allow degraded mount.

    This patchset will introduce a new per-chunk degradable check for
    btrfs, allow above case to succeed, and it's quite small anyway.

    Signed-off-by: Qu Wenruo
    Signed-off-by: Anand Jain
    Reviewed-by: David Sterba
    [ copied text from cover letter with more details about the problem being
    solved ]
    Signed-off-by: David Sterba

    Qu Wenruo
     
  • In btrfs_full_stripe_len/btrfs_is_parity_mirror we have similar code which
    gets the chunk map for a particular range via get_chunk_map. However,
    get_chunk_map can return an ERR_PTR value and while the 2 callers do catch
    this with a WARN_ON they then proceed to indiscriminately dereference the
    extent map. This of course leads to a crash. Fix the offenders by making the
    dereference conditional on IS_ERR.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov
     
  • __btrfs_alloc_chunk contains code which boils down to:

    ndevs = min(ndevs, devs_max)

    It's conditional upon devs_max not being 0. However, it cannot really be 0
    since it's always set to either BTRFS_MAX_DEVS_SYS_CHUNK or
    BTRFS_MAX_DEVS(fs_info->chunk_root). So eliminate the condition check and use
    min explicitly. This has no functional changes.

    Signed-off-by: Nikolay Borisov
    Reviewed-by: David Sterba
    Signed-off-by: David Sterba

    Nikolay Borisov