29 Jun, 2020

2 commits


24 Jun, 2020

1 commit


03 Jun, 2020

1 commit

  • Pull btrfs updates from David Sterba:
    "Highlights:

    - speedup dead root detection during orphan cleanup, eg. when there
    are many deleted subvolumes waiting to be cleaned, the trees are
    now looked up in radix tree instead of a O(N^2) search

    - snapshot creation with inherited qgroup will mark the qgroup
    inconsistent, requires a rescan

    - send will emit file capabilities after chown, this produces a
    stream that does not need postprocessing to set the capabilities
    again

    - direct io ported to iomap infrastructure, cleaned up and simplified
    code, notably removing last use of struct buffer_head in btrfs code

    Core changes:

    - factor out backreference iteration, to be used by ordinary
    backreferences and relocation code

    - improved global block reserve utilization
    * better logic to serialize requests
    * increased maximum available for unlink
    * improved handling on large pages (64K)

    - direct io cleanups and fixes
    * simplify layering, where cloned bios were unnecessarily created
    for some cases
    * error handling fixes (submit, endio)
    * remove repair worker thread, used to avoid deadlocks during
    repair

    - refactored block group reading code, preparatory work for new type
    of block group storage that should improve mount time on large
    filesystems

    Cleanups:

    - cleaned up (and slightly sped up) set/get helpers for metadata data
    structure members

    - root bit REF_COWS got renamed to SHAREABLE to reflect the that the
    blocks of the tree get shared either among subvolumes or with the
    relocation trees

    Fixes:

    - when subvolume deletion fails due to ENOSPC, the filesystem is not
    turned read-only

    - device scan deals with devices from other filesystems that changed
    ownership due to overwrite (mkfs)

    - fix a race between scrub and block group removal/allocation

    - fix long standing bug of a runaway balance operation, printing the
    same line to the syslog, caused by a stale status bit on a reloc
    tree that prevented progress

    - fix corrupt log due to concurrent fsync of inodes with shared
    extents

    - fix space underflow for NODATACOW and buffered writes when it for
    some reason needs to fallback to COW mode"

    * tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
    btrfs: fix space_info bytes_may_use underflow during space cache writeout
    btrfs: fix space_info bytes_may_use underflow after nocow buffered write
    btrfs: fix wrong file range cleanup after an error filling dealloc range
    btrfs: remove redundant local variable in read_block_for_search
    btrfs: open code key_search
    btrfs: split btrfs_direct_IO to read and write part
    btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
    fs: remove dio_end_io()
    btrfs: switch to iomap_dio_rw() for dio
    iomap: remove lockdep_assert_held()
    iomap: add a filesystem hook for direct I/O bio submission
    fs: export generic_file_buffered_read()
    btrfs: turn space cache writeout failure messages into debug messages
    btrfs: include error on messages about failure to write space/inode caches
    btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
    btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
    btrfs: make checksum item extension more efficient
    btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
    btrfs: unexport btrfs_compress_set_level()
    btrfs: simplify iget helpers
    ...

    Linus Torvalds
     

02 Jun, 2020

1 commit

  • We really don't care about triggering buffer errors for this condition.
    This avoids a spew of:

    Buffer I/O error on dev sdc, logical block 785929, async page read
    Buffer I/O error on dev sdc, logical block 759095, async page read
    Buffer I/O error on dev sdc, logical block 766922, async page read
    Buffer I/O error on dev sdc, logical block 17659, async page read
    Buffer I/O error on dev sdc, logical block 637571, async page read
    Buffer I/O error on dev sdc, logical block 39241, async page read
    Buffer I/O error on dev sdc, logical block 397241, async page read
    Buffer I/O error on dev sdc, logical block 763992, async page read

    from -EAGAIN conditions on request allocation for async reads.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 May, 2020

1 commit


25 May, 2020

1 commit

  • An upcoming Btrfs fix needs to know the original size of a non-cloned
    bios. Rather than accessing the bvec table directly, let's add a
    bio_for_each_bvec_all() accessor.

    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Omar Sandoval
    Signed-off-by: David Sterba

    Omar Sandoval
     

19 May, 2020

1 commit


19 Apr, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

28 Mar, 2020

1 commit

  • The bio_map_* helpers are just the low-level helpers for the
    blk_rq_map_* APIs. Move them together for better logical grouping,
    as no there isn't much overlap with other code in bio.c.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

25 Mar, 2020

1 commit


29 Dec, 2019

1 commit

  • Some filesystem, such as vfat, may send bio which crosses device boundary,
    and the worse thing is that the IO request starting within device boundaries
    can contain more than one segment past EOD.

    Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    tries to fix this issue by returning -EIO for this situation. However,
    this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
    may hang for ever.

    Also the current truncating on last segment is dangerous by updating the
    last bvec, given bvec table becomes not immutable any more, and fs bio
    users may not retrieve the truncated pages via bio_for_each_segment_all() in
    its .end_io callback.

    Fixes this issue by supporting multi-segment truncating. And the
    approach is simpler:

    - just update bio size since block layer can make correct bvec with
    the updated bio size. Then bvec table becomes really immutable.

    - zero all truncated segments for read bio

    Cc: Carlos Maiolino
    Cc: linux-fsdevel@vger.kernel.org
    Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

01 Jul, 2019

2 commits

  • 'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
    bytes.

    Before 07173c3ec276 ("block: enable multipage bvecs"), one bio can
    include very limited pages, and usually at most 256, so the fs bio
    size won't be bigger than 1M bytes most of times.

    Since we support multi-page bvec, in theory one fs bio really can
    be added > 1M pages, especially in case of hugepage, or big writeback
    with too many dirty pages. Then there is chance in which .bi_size
    is overflowed.

    Fixes this issue by using bio_full() to check if the added segment may
    overflow .bi_size.

    Cc: Liu Yiding
    Cc: kernel test robot
    Cc: "Darrick J. Wong"
    Cc: linux-xfs@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: 07173c3ec276 ("block: enable multipage bvecs")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
    fix. Otherwise we end up having conflicts with future patches between
    for-5.3/block and master that touch this area. In particular, it makes
    the bio_full() fix hard to backport to stable.

    * tag 'v5.2-rc6': (482 commits)
    Linux 5.2-rc6
    Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
    Bluetooth: Fix regression with minimum encryption key size alignment
    tcp: refine memory limit test in tcp_fragment()
    x86/vdso: Prevent segfaults due to hoisted vclock reads
    SUNRPC: Fix a credential refcount leak
    Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
    net :sunrpc :clnt :Fix xps refcount imbalance on the error path
    NFS4: Only set creation opendata if O_CREAT
    ARM: 8867/1: vdso: pass --be8 to linker if necessary
    KVM: nVMX: reorganize initial steps of vmx_set_nested_state
    KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
    habanalabs: use u64_to_user_ptr() for reading user pointers
    nfsd: replace Jeff by Chuck as nfsd co-maintainer
    inet: clear num_timeout reqsk_alloc()
    PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
    net: mvpp2: debugfs: Add pmap to fs dump
    ipv6: Default fib6_type to RTN_UNICAST when not set
    net: hns3: Fix inconsistent indenting
    net/af_iucv: always register net_device notifier
    ...

    Jens Axboe
     

29 Jun, 2019

2 commits


27 Jun, 2019

1 commit


21 Jun, 2019

1 commit

  • We only need the number of segments in the blk-mq submission path.
    Remove the field from struct bio, and return it from a variant of
    blk_queue_split instead of that it can passed as an argument to
    those functions that need the value.

    This also means we stop recounting segments except for cloning
    and partial segments.

    To keep the number of arguments in this how path down remove
    pointless struct request_queue arguments from any of the functions
    that had it and grew a nr_segs argument.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Jun, 2019

1 commit

  • We currently have an input same_page parameter to __bio_try_merge_page
    to prohibit merging in the same page. The rationale for that is that
    some callers need to account for every page added to a bio. Instead of
    letting these callers call twice into the merge code to account for the
    new vs existing page cases, just turn the paramter into an output one that
    returns if a merge in the same page occured and let them act accordingly.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 May, 2019

1 commit

  • This barrier only applies to the read-modify-write operations; in
    particular, it does not apply to the atomic_set() primitive.

    Replace the barrier with an smp_mb().

    Fixes: dac56212e8127 ("bio: skip atomic inc/dec of ->bi_cnt for most use cases")
    Cc: stable@vger.kernel.org
    Reported-by: "Paul E. McKenney"
    Reported-by: Peter Zijlstra
    Signed-off-by: Andrea Parri
    Reviewed-by: Ming Lei
    Cc: Jens Axboe
    Cc: Ming Lei
    Cc: linux-block@vger.kernel.org
    Cc: "Paul E. McKenney"
    Cc: Peter Zijlstra
    Signed-off-by: Jens Axboe

    Andrea Parri
     

01 May, 2019

1 commit


30 Apr, 2019

2 commits


22 Apr, 2019

1 commit

  • Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a
    comment, and is trivial. The other one is a conflict due to a later fix
    in the bio multi-page work, and needs a bit more care.

    * tag 'v5.1-rc6': (770 commits)
    Linux 5.1-rc6
    block: make sure that bvec length can't be overflow
    block: kill all_q_node in request_queue
    x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
    coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
    mm/kmemleak.c: fix unused-function warning
    init: initialize jump labels before command line option parsing
    kernel/watchdog_hld.c: hard lockup message should end with a newline
    kcov: improve CONFIG_ARCH_HAS_KCOV help text
    mm: fix inactive list balancing between NUMA nodes and cgroups
    mm/hotplug: treat CMA pages as unmovable
    proc: fixup proc-pid-vm test
    proc: fix map_files test on F29
    mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
    mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
    mm: swapoff: shmem_unuse() stop eviction without igrab()
    mm: swapoff: take notice of completion sooner
    mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
    mm: swapoff: shmem_find_swap_entries() filter out other types
    slab: store tagged freelist for off-slab slabmgmt
    ...

    Signed-off-by: Jens Axboe

    Jens Axboe
     

08 Apr, 2019

1 commit

  • Commit 6dc4f100c175 ("block: allow bio_for_each_segment_all() to
    iterate over multi-page bvec") changes bio_for_each_segment_all()
    to use for-inside-for.

    This way breaks all bio_for_each_segment_all() call with error out
    branch via 'break', since now 'break' can only break from the inner
    loop.

    Fixes this issue by implementing bio_for_each_segment_all() via
    single 'for' loop, and now the logic is very similar with normal
    bvec iterator.

    Cc: Qu Wenruo
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Omar Sandoval
    Reviewed-by: Johannes Thumshirn
    Reported-and-Tested-by: Qu Wenruo
    Fixes: 6dc4f100c175 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

02 Apr, 2019

1 commit

  • When the added page is merged to last same page in bio_add_pc_page(),
    the user may need to put this page for avoiding page leak.

    bio_map_user_iov() needs this kind of handling, and now it deals with
    it by itself in hack style.

    Moves the handling of put page into __bio_add_pc_page(), so
    bio_map_user_iov() may be simplified a bit, and maybe more users
    can benefit from this change.

    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

24 Feb, 2019

1 commit

  • For the upcoming async polled IO, we can't sleep allocating requests.
    If we do, then we introduce a deadlock where the submitter already
    has async polled IO in-flight, but can't wait for them to complete
    since polled requests must be active found and reaped.

    Utilize the helper in the blockdev DIRECT_IO code.

    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

15 Feb, 2019

5 commits

  • Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to
    increase BIO_MAX_PAGES for it.

    CONFIG_THP_SWAP needs to split one THP into normal pages and adds
    them all to one bio. With multipage-bvec, it just takes one bvec to
    hold them all.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This patch pulls the trigger for multi-page bvecs.

    Reviewed-by: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • This patch introduces one extra iterator variable to bio_for_each_segment_all(),
    then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

    Given it is just one mechannical & simple change on all bio_for_each_segment_all()
    users, this patch does tree-wide change in one single patch, so that we can
    avoid to use a temporary helper for this conversion.

    Reviewed-by: Omar Sandoval
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • bio_for_each_bvec() is used for iterating over multi-page bvec for bio
    split & merge code.

    rq_for_each_bvec() can be used for drivers which may handle the
    multi-page bvec directly, so far loop is one perfect use case.

    Reviewed-by: Christoph Hellwig
    Reviewed-by: Omar Sandoval
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • bio_readpage_error currently uses bi_vcnt to decide if it is worth
    retrying an I/O. But the vector count is mostly an implementation
    artifact - it really should figure out if there is more than a
    single sector worth retrying. Use bi_size for that and shift by
    PAGE_SHIFT. This really should be blocks/sectors, but given that
    btrfs doesn't support a sector size different from the PAGE_SIZE
    using the page size keeps the changes to a minimum.

    Reviewed-by: Omar Sandoval
    Reviewed-by: David Sterba
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

08 Dec, 2018

6 commits

  • Now that a bio only holds a blkg reference, so clean up is simply
    putting back that reference. Remove bio_disassociate_task() as it just
    calls bio_disassociate_blkg() and call the latter directly.

    Signed-off-by: Dennis Zhou
    Acked-by: Tejun Heo
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • Prior patches ensured that any bio that interacts with a request_queue
    is properly associated with a blkg. This makes bio->bi_css unnecessary
    as blkg maintains a reference to blkcg already.

    This removes the bio field bi_css and transfers corresponding uses to
    access via bi_blkg.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • One of the goals of this series is to remove a separate reference to
    the css of the bio. This can and should be accessed via bio_blkcg(). In
    this patch, wbc_init_bio() now requires a bio to have a device
    associated with it.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • A prior patch in this series added blkg association to bios issued by
    cgroups. There are two other paths that we want to attribute work back
    to the appropriate cgroup: swap and writeback. Here we modify the way
    swap tags bios to include the blkg. Writeback will be tackle in the next
    patch.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • Previously, blkg association was handled by controller specific code in
    blk-throttle and blk-iolatency. However, because a blkg represents a
    relationship between a blkcg and a request_queue, it makes sense to keep
    the blkg->q and bio->bi_disk->queue consistent.

    This patch moves association into the bio_set_dev macro(). This should
    cover the majority of cases where the device is set/changed keeping the
    two pointers consistent. Fallback code is added to
    blkcg_bio_issue_check() to catch any missing paths.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Dennis Zhou
     
  • There are 3 ways blkg association can happen: association with the
    current css, with the page css (swap), or from the wbc css (writeback).

    This patch handles how association is done for the first case where we
    are associating bsaed on the current css. If there is already a blkg
    associated, the css will be reused and association will be redone as the
    request_queue may have changed.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Dennis Zhou
     

02 Nov, 2018

1 commit

  • This reverts a series committed earlier due to null pointer exception
    bug report in [1]. It seems there are edge case interactions that I did
    not consider and will need some time to understand what causes the
    adverse interactions.

    The original series can be found in [2] with a follow up series in [3].

    [1] https://www.spinics.net/lists/cgroups/msg20719.html
    [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
    [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/

    This reverts the following commits:
    d459d853c2ed, b2c3fa546705, 101246ec02b5, b3b9f24f5fcc, e2b0989954ae,
    f0fcb3ec89f3, c839e7a03f92, bdc2491708c4, 74b7c02a9bc1, 5bf9a1f3b4ef,
    a7b39b4e961c, 07b05bcc3213, 49f4c2dc2b50, 27e6fa996c53

    Signed-off-by: Dennis Zhou
    Signed-off-by: Jens Axboe

    Dennis Zhou
     

21 Oct, 2018

1 commit

  • When submitting a bio, multiple recursive calls to make_request() may
    occur. This causes the initial associate done in blkcg_bio_issue_check()
    to be incorrect and reference the prior request_queue. This introduces
    a helper to do reassociation when make_request() is recursively called.

    Fixes: a7b39b4e961c ("blkcg: always associate a bio with a blkg")
    Reported-by: Valdis Kletnieks
    Signed-off-by: Dennis Zhou
    Tested-by: Valdis Kletnieks
    Signed-off-by: Jens Axboe

    Dennis Zhou