26 Jan, 2020

1 commit

  • [ Upstream commit ece841abbed2da71fa10710c687c9ce9efb6bf69 ]

    7c20f11680a4 ("bio-integrity: stop abusing bi_end_io") moves
    bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
    and bio_endio(). This way looks wrong because bio may be freed
    without calling bio_endio(), for example, blk_rq_unprep_clone() is
    called from dm_mq_queue_rq() when the underlying queue of dm-mpath
    is busy.

    So memory leak of bio integrity data is caused by commit 7c20f11680a4.

    Fixes this issue by re-adding bio_integrity_free() to bio_uninit().

    Fixes: 7c20f11680a4 ("bio-integrity: stop abusing bi_end_io")
    Reviewed-by: Christoph Hellwig
    Signed-off-by Justin Tee

    Add commit log, and simplify/fix the original patch wroten by Justin.

    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Justin Tee
     

18 Jan, 2020

1 commit

  • commit 83c9c547168e8b914ea6398430473a4de68c52cc upstream.

    Commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
    adds bio_truncate() for handling bio EOD. However, bio_truncate()
    doesn't use the passed 'op' parameter from guard_bio_eod's callers.

    So bio_trunacate() may retrieve wrong 'op', and zering pages may
    not be done for READ bio.

    Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
    in submit_bh_wbc() so that bio_truncate() can always retrieve correct
    op info.

    Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
    used any more.

    Cc: Carlos Maiolino
    Cc: linux-fsdevel@vger.kernel.org
    Fixes: 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod")
    Signed-off-by: Ming Lei
    Signed-off-by: Greg Kroah-Hartman

    Fold in kerneldoc and bio_op() change.

    Signed-off-by: Jens Axboe

    Ming Lei
     

09 Jan, 2020

1 commit

  • [ Upstream commit 85a8ce62c2eabe28b9d76ca4eecf37922402df93 ]

    Some filesystem, such as vfat, may send bio which crosses device boundary,
    and the worse thing is that the IO request starting within device boundaries
    can contain more than one segment past EOD.

    Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    tries to fix this issue by returning -EIO for this situation. However,
    this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
    may hang for ever.

    Also the current truncating on last segment is dangerous by updating the
    last bvec, given bvec table becomes not immutable any more, and fs bio
    users may not retrieve the truncated pages via bio_for_each_segment_all() in
    its .end_io callback.

    Fixes this issue by supporting multi-segment truncating. And the
    approach is simpler:

    - just update bio size since block layer can make correct bvec with
    the updated bio size. Then bvec table becomes really immutable.

    - zero all truncated segments for read bio

    Cc: Carlos Maiolino
    Cc: linux-fsdevel@vger.kernel.org
    Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors")
    Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Ming Lei
     

21 Dec, 2019

1 commit

  • commit cc90bc68422318eb8e75b15cd74bc8d538a7df29 upstream.

    This partially reverts commit e3a5d8e386c3fb973fa75f2403622a8f3640ec06.

    Commit e3a5d8e386c3 ("check bi_size overflow before merge") adds a bio_full
    check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
    when the last bi_io_vec has been reached. Instead, what we want here is only
    the bi_size overflow check.

    Fixes: e3a5d8e386c3 ("block: check bi_size overflow before merge")
    Cc: stable@vger.kernel.org # v5.4+
    Reviewed-by: Ming Lei
    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Andreas Gruenbacher
     

12 Nov, 2019

1 commit

  • __bio_try_merge_page() may merge a page to bio without bio_full() check
    and cause bi_size overflow.

    The overflow typically ends up with sd_init_command() warning on zero
    segment request with call trace like this:

    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180
    CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1
    Workqueue: kblockd blk_mq_run_work_fn
    RIP: 0010:scsi_init_io+0x156/0x180
    RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246
    RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000
    RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118
    RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150
    R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000
    R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000
    FS: 0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0
    Call Trace:
    sd_init_command+0x326/0xb40 [sd_mod]
    scsi_queue_rq+0x502/0xaa0
    ? blk_mq_get_driver_tag+0xe7/0x120
    blk_mq_dispatch_rq_list+0x256/0x5a0
    ? elv_rb_del+0x24/0x30
    ? deadline_remove_request+0x7b/0xc0
    blk_mq_do_dispatch_sched+0xa3/0x140
    blk_mq_sched_dispatch_requests+0xfb/0x170
    __blk_mq_run_hw_queue+0x81/0x130
    blk_mq_run_work_fn+0x1b/0x20
    process_one_work+0x179/0x390
    worker_thread+0x4f/0x3e0
    kthread+0x105/0x140
    ? max_active_store+0x80/0x80
    ? kthread_bind+0x20/0x20
    ret_from_fork+0x35/0x40
    ---[ end trace f9036abf5af4a4d3 ]---
    blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0
    XFS (sdd1): writeback error on sector 2875552

    __bio_try_merge_page() should check the overflow before actually doing
    merge.

    Fixes: 07173c3ec276c ("block: enable multipage bvecs")
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Jens Axboe

    Junichi Nomura
     

22 Aug, 2019

3 commits


14 Aug, 2019

1 commit

  • psi tracks the time tasks wait for refaulting pages to become
    uptodate, but it does not track the time spent submitting the IO. The
    submission part can be significant if backing storage is contended or
    when cgroup throttling (io.latency) is in effect - a lot of time is
    spent in submit_bio(). In that case, we underreport memory pressure.

    Annotate submit_bio() to account submission time as memory stall when
    the bio is reading userspace workingset pages.

    Tested-by: Suren Baghdasaryan
    Signed-off-by: Johannes Weiner
    Signed-off-by: Jens Axboe

    Johannes Weiner
     

06 Aug, 2019

1 commit


05 Aug, 2019

1 commit


12 Jul, 2019

1 commit

  • To allow the SCSI subsystem scsi_execute_req() function to issue
    requests using large buffers that are better allocated with vmalloc()
    rather than kmalloc(), modify bio_map_kern() to allow passing a buffer
    allocated with vmalloc().

    To do so, detect vmalloc-ed buffers using is_vmalloc_addr(). For
    vmalloc-ed buffers, flush the buffer using flush_kernel_vmap_range(),
    use vmalloc_to_page() instead of virt_to_page() to obtain the pages of
    the buffer, and invalidate the buffer addresses with
    invalidate_kernel_vmap_range() on completion of read BIOs. This last
    point is executed using the function bio_invalidate_vmalloc_pages()
    which is defined only if the architecture defines
    ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE, that is, if the architecture
    actually needs the invalidation done.

    Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation")
    Fixes: e76239a3748c ("block: add a report_zones method")
    Cc: stable@vger.kernel.org
    Reviewed-by: Martin K. Petersen
    Signed-off-by: Damien Le Moal
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Chaitanya Kulkarni
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

01 Jul, 2019

2 commits

  • 'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
    bytes.

    Before 07173c3ec276 ("block: enable multipage bvecs"), one bio can
    include very limited pages, and usually at most 256, so the fs bio
    size won't be bigger than 1M bytes most of times.

    Since we support multi-page bvec, in theory one fs bio really can
    be added > 1M pages, especially in case of hugepage, or big writeback
    with too many dirty pages. Then there is chance in which .bi_size
    is overflowed.

    Fixes this issue by using bio_full() to check if the added segment may
    overflow .bi_size.

    Cc: Liu Yiding
    Cc: kernel test robot
    Cc: "Darrick J. Wong"
    Cc: linux-xfs@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Fixes: 07173c3ec276 ("block: enable multipage bvecs")
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
    fix. Otherwise we end up having conflicts with future patches between
    for-5.3/block and master that touch this area. In particular, it makes
    the bio_full() fix hard to backport to stable.

    * tag 'v5.2-rc6': (482 commits)
    Linux 5.2-rc6
    Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
    Bluetooth: Fix regression with minimum encryption key size alignment
    tcp: refine memory limit test in tcp_fragment()
    x86/vdso: Prevent segfaults due to hoisted vclock reads
    SUNRPC: Fix a credential refcount leak
    Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
    net :sunrpc :clnt :Fix xps refcount imbalance on the error path
    NFS4: Only set creation opendata if O_CREAT
    ARM: 8867/1: vdso: pass --be8 to linker if necessary
    KVM: nVMX: reorganize initial steps of vmx_set_nested_state
    KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
    habanalabs: use u64_to_user_ptr() for reading user pointers
    nfsd: replace Jeff by Chuck as nfsd co-maintainer
    inet: clear num_timeout reqsk_alloc()
    PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
    net: mvpp2: debugfs: Add pmap to fs dump
    ipv6: Default fib6_type to RTN_UNICAST when not set
    net: hns3: Fix inconsistent indenting
    net/af_iucv: always register net_device notifier
    ...

    Jens Axboe
     

29 Jun, 2019

5 commits


27 Jun, 2019

1 commit


21 Jun, 2019

1 commit

  • We only need the number of segments in the blk-mq submission path.
    Remove the field from struct bio, and return it from a variant of
    blk_queue_split instead of that it can passed as an argument to
    those functions that need the value.

    This also means we stop recounting segments except for cloning
    and partial segments.

    To keep the number of arguments in this how path down remove
    pointless struct request_queue arguments from any of the functions
    that had it and grew a nr_segs argument.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

17 Jun, 2019

2 commits

  • When multiple iovecs reference the same page, each get_user_page call
    will add a reference to the page. But once we've created the bio that
    information gets lost and only a single reference will be dropped after
    I/O completion. Use the same_page information returned from
    __bio_try_merge_page to drop additional references to pages that were
    already present in the bio.

    Based on a patch from Ming Lei.

    Link: https://lkml.org/lkml/2019/4/23/64
    Fixes: 576ed913 ("block: use bio_add_page in bio_iov_iter_get_pages")
    Reported-by: David Gibson
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • We currently have an input same_page parameter to __bio_try_merge_page
    to prohibit merging in the same page. The rationale for that is that
    some callers need to account for every page added to a bio. Instead of
    letting these callers call twice into the merge code to account for the
    new vs existing page cases, just turn the paramter into an output one that
    returns if a merge in the same page occured and let them act accordingly.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Ming Lei
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

15 Jun, 2019

1 commit

  • One of the more common cases of allocation size calculations is finding
    the size of a structure that has a zero-sized array at the end, along
    with memory for some number of elements for that array. For example:

    struct bio_map_data {
    ...
    struct iovec iov[];
    };

    instance = kmalloc(sizeof(sizeof(struct bio_map_data) + sizeof(struct iovec) *
    count, GFP_KERNEL);

    Instead of leaving these open-coded and prone to type mistakes, we can
    now use the new struct_size() helper:

    instance = kmalloc(struct_size(instance, iov, count), GFP_KERNEL);

    This code was detected with the help of Coccinelle.

    Reviewed-by: Kees Cook
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Jens Axboe

    Gustavo A. R. Silva
     

01 May, 2019

1 commit


30 Apr, 2019

4 commits


24 Apr, 2019

1 commit

  • The refcount has been increased for pages retrieved from non-bvec iov iter
    via __bio_iov_iter_get_pages(), so don't need to do that again.

    Otherwise, IO pages are leaked easily.

    Cc: Christoph Hellwig
    Reviewed-by: Chaitanya Kulkarni
    Fixes: 7321ecbfc7cf ("block: change how we get page references in bio_iov_iter_get_pages")
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

23 Apr, 2019

1 commit

  • bio_add_page() and __bio_add_page() are capable of adding pages into
    bio, and now we have at least two such usages alreay:

    - __bio_iov_bvec_add_pages()
    - nvmet_bdev_execute_rw().

    So update comments on these two helpers.

    The thing is a bit special for __bio_try_merge_page(), given the caller
    needs to know if the new added page is same with the last added page,
    then it isn't safe to pass multi-page in case that 'same_page' is true,
    so adds warning on potential misuse, and updates comment on
    __bio_try_merge_page().

    Cc: linux-xfs@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

22 Apr, 2019

1 commit

  • Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a
    comment, and is trivial. The other one is a conflict due to a later fix
    in the bio multi-page work, and needs a bit more care.

    * tag 'v5.1-rc6': (770 commits)
    Linux 5.1-rc6
    block: make sure that bvec length can't be overflow
    block: kill all_q_node in request_queue
    x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
    coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
    mm/kmemleak.c: fix unused-function warning
    init: initialize jump labels before command line option parsing
    kernel/watchdog_hld.c: hard lockup message should end with a newline
    kcov: improve CONFIG_ARCH_HAS_KCOV help text
    mm: fix inactive list balancing between NUMA nodes and cgroups
    mm/hotplug: treat CMA pages as unmovable
    proc: fixup proc-pid-vm test
    proc: fix map_files test on F29
    mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
    mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
    mm: swapoff: shmem_unuse() stop eviction without igrab()
    mm: swapoff: take notice of completion sooner
    mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
    mm: swapoff: shmem_find_swap_entries() filter out other types
    slab: store tagged freelist for off-slab slabmgmt
    ...

    Signed-off-by: Jens Axboe

    Jens Axboe
     

12 Apr, 2019

4 commits


11 Apr, 2019

1 commit

  • When bio_add_pc_page() fails in bio_copy_user_iov() we should free
    the page we just allocated otherwise we are leaking it.

    Cc: linux-block@vger.kernel.org
    Cc: Linus Torvalds
    Cc: stable@vger.kernel.org
    Reviewed-by: Chaitanya Kulkarni
    Signed-off-by: Jérôme Glisse
    Signed-off-by: Jens Axboe

    Jérôme Glisse
     

04 Apr, 2019

1 commit

  • With the introduction of BIO_NO_PAGE_REF we've used up all available bits
    in bio::bi_flags.

    Convert the defines of the flags to an enum and add a BUILD_BUG_ON() call
    to make sure no-one adds a new one and thus overrides the BVEC_POOL_IDX
    causing crashes.

    Reviewed-by: Ming Lei
    Reviewed-by: Hannes Reinecke
    Reviewed-by: Bart Van Assche
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe

    Johannes Thumshirn
     

02 Apr, 2019

2 commits

  • Now block IO stack is basically ready for supporting multi-page bvec,
    however it isn't enabled on passthrough IO.

    One reason is that passthrough IO is dispatched to LLD directly and bio
    split is bypassed, so the bio has to be built correctly for dispatch to
    LLD from the beginning.

    Implement multi-page support for passthrough IO by limitting each bvec
    as block device's segment and applying all kinds of queue limit in
    blk_add_pc_page(). Then we don't need to calculate segments any more for
    passthrough IO any more, turns out code is simplified much.

    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • When the added page is merged to last same page in bio_add_pc_page(),
    the user may need to put this page for avoiding page leak.

    bio_map_user_iov() needs this kind of handling, and now it deals with
    it by itself in hack style.

    Moves the handling of put page into __bio_add_pc_page(), so
    bio_map_user_iov() may be simplified a bit, and maybe more users
    can benefit from this change.

    Cc: Omar Sandoval
    Cc: Christoph Hellwig
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei