17 Jan, 2021

2 commits

  • commit 3a21777c6ee99749bac10727b3c17e5bcfebe5c1 upstream.

    We had kernel panic, it is caused by unload module and last
    close confirmation.

    call trace:
    [1196029.743127] free_sess+0x15/0x50 [rtrs_client]
    [1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client]
    [1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
    [1196029.743130] close_rtrs+0x25/0x50 [rnbd_client]
    [1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client]
    [1196029.743132] __x64_sys_delete_module+0x190/0x260

    And in the crashdump confirmation kworker is also running.
    PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2"
    #0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
    #1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
    #2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
    #3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
    #4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
    #5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
    #6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
    #7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
    #8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
    #9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]

    The problem is both code path try to close same session, which lead to
    panic.

    To fix it, just skip the sess if the refcount already drop to 0.

    Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality")
    Signed-off-by: Jack Wang
    Reviewed-by: Gioh Kim
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Jack Wang
     
  • commit 36a106a4c1c100d55ba3d32a21ef748cfcd4fa99 upstream.

    Without crc32, the driver fails to link:

    arm-linux-gnueabi-ld: drivers/block/rsxx/config.o: in function `rsxx_load_config':
    config.c:(.text+0x124): undefined reference to `crc32_le'

    Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     

30 Dec, 2020

8 commits

  • commit 2e896d89510f23927ec393bee1e0570db3d5a6c6 upstream.

    Conventional zones do not have a write pointer and so cannot accept zone
    append writes. Make sure to fail any zone append write command issued to
    a conventional zone.

    Reported-by: Naohiro Aota
    Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Damien Le Moal
     
  • commit 0ebcdd702f49aeb0ad2e2d894f8c124a0acc6e23 upstream.

    For a null_blk device with zoned mode enabled is currently initialized
    with a number of zones equal to the device capacity divided by the zone
    size, without considering if the device capacity is a multiple of the
    zone size. If the zone size is not a divisor of the capacity, the zones
    end up not covering the entire capacity, potentially resulting is out
    of bounds accesses to the zone array.

    Fix this by adding one last smaller zone with a size equal to the
    remainder of the disk capacity divided by the zone size if the capacity
    is not a multiple of the zone size. For such smaller last zone, the zone
    capacity is also checked so that it does not exceed the smaller zone
    size.

    Reported-by: Naohiro Aota
    Fixes: ca4b2a011948 ("null_blk: add zone support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Damien Le Moal
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jens Axboe
    Signed-off-by: Greg Kroah-Hartman

    Damien Le Moal
     
  • commit 2e85d32b1c865bec703ce0c962221a5e955c52c2 upstream.

    Some code does not directly make 'xenbus_watch' object and call
    'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This
    commit adds support of 'will_handle' callback in the
    'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'.

    This is part of XSA-349

    Cc: stable@vger.kernel.org
    Signed-off-by: SeongJae Park
    Reported-by: Michael Kurth
    Reported-by: Pawel Wieczorkiewicz
    Reviewed-by: Juergen Gross
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    SeongJae Park
     
  • commit 1c728719a4da6e654afb9cc047164755072ed7c9 upstream.

    When xen_blkif_disconnect() is called, the kernel thread behind the
    block interface is stopped by calling kthread_stop(ring->xenblkd).
    The ring->xenblkd thread pointer being non-NULL determines if the
    thread has been already stopped.
    Normally, the thread's function xen_blkif_schedule() sets the
    ring->xenblkd to NULL, when the thread's main loop ends.

    However, when the thread has not been started yet (i.e.
    wake_up_process() has not been called on it), the xen_blkif_schedule()
    function would not be called yet.

    In such case the kthread_stop() call returns -EINTR and the
    ring->xenblkd remains dangling.
    When this happens, any consecutive call to xen_blkif_disconnect (for
    example in frontend_changed() callback) leads to a kernel crash in
    kthread_stop() (e.g. NULL pointer dereference in exit_creds()).

    This is XSA-350.

    Cc: # 4.12
    Fixes: a24fa22ce22a ("xen/blkback: don't use xen_blkif_get() in xen-blkback kthread")
    Reported-by: Olivier Benjamin
    Reported-by: Pawel Wieczorkiewicz
    Signed-off-by: Pawel Wieczorkiewicz
    Reviewed-by: Julien Grall
    Reviewed-by: Juergen Gross
    Signed-off-by: Juergen Gross
    Signed-off-by: Greg Kroah-Hartman

    Pawel Wieczorkiewicz
     
  • [ Upstream commit 46067844efdb8275ade705923120fc5391543b53 ]

    In error case, we do not free the memory for blk_symlink_name.

    Do it by free the memory in error case, and set to NULL
    afterwards.

    Also fix the condition in rnbd_clt_remove_dev_symlink.

    Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name")
    Signed-off-by: Jack Wang
    Reviewed-by: Md Haris Iqbal
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Jack Wang
     
  • [ Upstream commit e7508d48565060af5d89f10cb83c9359c8ae1310 ]

    The kernel test robot triggerred the following warning,

    >> drivers/block/rnbd/rnbd-clt.c:1397:42: warning: size argument in
    'strlcpy' call appears to be size of the source; expected the size of the
    destination [-Wstrlcpy-strlcat-size]
    strlcpy(dev->pathname, pathname, strlen(pathname) + 1);
    ~~~~~~~^~~~~~~~~~~~~

    To get rid of the above warning, use a kstrdup as Bart suggested.

    Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name")
    Reported-by: kernel test robot
    Signed-off-by: Md Haris Iqbal
    Signed-off-by: Jack Wang
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Md Haris Iqbal
     
  • [ Upstream commit 733c15bd3a944b8eeaacdddf061759b6a83dd3f4 ]

    Currently in the case where dev->blk_symlink_name fails to be allocates
    the error return path attempts to set an end-of-string character to
    the unallocated dev->blk_symlink_name causing a null pointer dereference
    error. Fix this by returning with an explicity ENOMEM error (which also
    is missing in the original code as was not initialized).

    Fixes: 1eb54f8f5dd8 ("block/rnbd: client: sysfs interface functions")
    Signed-off-by: Colin Ian King
    Addresses-Coverity: ("Dereference after null check")
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Colin Ian King
     
  • [ Upstream commit 64e8a6ece1a5b1fa21316918053d068baeac84af ]

    For every rnbd_clt_dev, we alloc the pathname and blk_symlink_name
    statically to NAME_MAX which is 255 bytes. In most of the cases we only
    need less than 10 bytes, so 500 bytes per block device are wasted.

    This commit dynamically allocates memory buffer for pathname and
    blk_symlink_name.

    Signed-off-by: Md Haris Iqbal
    Signed-off-by: Jack Wang
    Reviewed-by: Lutz Pogrell
    Signed-off-by: Jens Axboe
    Signed-off-by: Sasha Levin

    Md Haris Iqbal
     

09 Dec, 2020

1 commit

  • Instead of having similar helpers in multiple backend drivers use
    common helpers for caching pages allocated via gnttab_alloc_pages().

    Make use of those helpers in blkback and scsiback.

    Cc: # 5.9
    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Juergen Gross

    Juergen Gross
     

13 Nov, 2020

1 commit

  • Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
    causes an occasional drop of loop device uevent, which are no longer
    triggered in loop_set_size() but in a different part of code.

    Bug is reproducible with LTP test uevent01 [1]:

    i=0; while true; do
    i=$((i+1)); echo "== $i =="
    lsmod |grep -q loop && rmmod -f loop
    ./uevent01 || break
    done

    Put back triggering through code called in loop_set_size().

    Fix required to add yet another parameter to
    set_capacity_revalidate_and_notify().

    [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c

    [hch: rebased on a different change to the prototype of
    set_capacity_revalidate_and_notify]

    Cc: stable@vger.kernel.org # v5.9
    Fixes: 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify")
    Reported-by:
    Signed-off-by: Petr Vorel
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Petr Vorel
     

10 Nov, 2020

1 commit


07 Nov, 2020

1 commit

  • Commit aa1c09cb65e2 ("null_blk: Fix locking in zoned mode") changed
    zone locking to using the potentially sleeping wait_on_bit_io()
    function. This is acceptable when memory backing is enabled as the
    device queue is in that case marked as blocking, but this triggers a
    scheduling while in atomic context with memory backing disabled.

    Fix this by relying solely on the device zone spinlock for zone
    information protection without temporarily releasing this lock around
    null_process_cmd() execution in null_zone_write(). This is OK to do
    since when memory backing is disabled, command processing does not
    block and the memory backing lock nullb->lock is unused. This solution
    avoids the overhead of having to mark a zoned null_blk device queue as
    blocking when memory backing is unused.

    This patch also adds comments to the zone locking code to explain the
    unusual locking scheme.

    Fixes: aa1c09cb65e2 ("null_blk: Fix locking in zoned mode")
    Reported-by: kernel test robot
    Signed-off-by: Damien Le Moal
    Reviewed-by: Christoph Hellwig
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Damien Le Moal
     

29 Oct, 2020

4 commits

  • Use platform_get_resource() to fetch the memory resource and
    platform_get_irq_optional() to get optional IRQ instead of
    open-coded variants.

    IRQ is not supposed to be changed at runtime, so there is
    no functional change in ace_fsm_yieldirq().

    On the other hand we now take first resources instead of last ones
    to proceed. I can't imagine how broken should be firmware to have
    a garbage in the first resource slots. But if it the case, it needs
    to be documented.

    Signed-off-by: Andy Shevchenko
    Acked-by: Michal Simek
    Signed-off-by: Jens Axboe

    Andy Shevchenko
     
  • When the zoned mode is enabled in null_blk, Serializing read, write
    and zone management operations for each zone is necessary to protect
    device level information for managing zone resources (zone open and
    closed counters) as well as each zone condition and write pointer
    position. Commit 35bc10b2eafb ("null_blk: synchronization fix for
    zoned device") introduced a spinlock to implement this serialization.
    However, when memory backing is also enabled, GFP_NOIO memory
    allocations are executed under the spinlock, resulting in might_sleep()
    warnings. Furthermore, the zone_lock spinlock is locked/unlocked using
    spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with
    the nullb->lock spinlock. This nested use of irq locks wrecks the irq
    enabled/disabled state.

    Fix all this by introducing a bitmap for per-zone lock, with locking
    implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit().
    This locking mechanism allows keeping a zone locked while executing
    null_process_cmd(), serializing all operations to the zone while
    allowing to sleep during memory backing allocation with GFP_NOIO.
    Device level zone resource management information is protected using
    a spinlock which is not held while executing null_process_cmd();

    Fixes: 35bc10b2eafb ("null_blk: synchronization fix for zoned device")
    Signed-off-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is
    ignored and the operation is applied to all sequential zones. For these
    commands, tracing the effect of the command using the command sector to
    determine the target zone is thus incorrect.

    Fix null_zone_mgmt() zone condition tracing in the case of
    REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are
    not already empty.

    Fixes: 766c3297d7e1 ("null_blk: add trace in null_blk_zoned.c")
    Signed-off-by: Damien Le Moal
    Cc: stable@vger.kernel.org
    Signed-off-by: Jens Axboe

    Damien Le Moal
     
  • Mounted NBD device can be resized, one use case is rbd-nbd.

    Fix the issue by setting up default block size, then not touch it
    in nbd_size_update() any more. This kind of usage is aligned with loop
    which has same use case too.

    Cc: stable@vger.kernel.org
    Fixes: c8a83a6b54d0 ("nbd: Use set_blocksize() to set device blocksize")
    Reported-by: lining
    Signed-off-by: Ming Lei
    Cc: Josef Bacik
    Cc: Jan Kara
    Tested-by: lining
    Signed-off-by: Jens Axboe

    Ming Lei
     

28 Oct, 2020

1 commit

  • Parallel write,read,zone-mgmt operations accessing/altering zone state
    and write-pointer may get into race. Avoid the situation by using a new
    spinlock for zoned device.
    Concurrent zone-appends (on a zone) returning same write-pointer issue
    is also avoided using this lock.

    Cc: stable@vger.kernel.org
    Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
    Signed-off-by: Kanchan Joshi
    Reviewed-by: Damien Le Moal
    Signed-off-by: Jens Axboe

    Kanchan Joshi
     

26 Oct, 2020

1 commit

  • Pull more xen updates from Juergen Gross:

    - a series for the Xen pv block drivers adding module parameters for
    better control of resource usge

    - a cleanup series for the Xen event driver

    * tag 'for-linus-5.10b-rc1c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    Documentation: add xen.fifo_events kernel parameter description
    xen/events: unmask a fifo event channel only if it was masked
    xen/events: only register debug interrupt for 2-level events
    xen/events: make struct irq_info private to events_base.c
    xen: remove no longer used functions
    xen-blkfront: Apply changed parameter name to the document
    xen-blkfront: add a parameter for disabling of persistent grants
    xen-blkback: add a parameter for disabling of persistent grants

    Linus Torvalds
     

25 Oct, 2020

1 commit

  • Pull block fixes from Jens Axboe:

    - NVMe pull request from Christoph
    - rdma error handling fixes (Chao Leng)
    - fc error handling and reconnect fixes (James Smart)
    - fix the qid displace when tracing ioctl command (Keith Busch)
    - don't use BLK_MQ_REQ_NOWAIT for passthru (Chaitanya Kulkarni)
    - fix MTDT for passthru (Logan Gunthorpe)
    - blacklist Write Same on more devices (Kai-Heng Feng)
    - fix an uninitialized work struct (zhenwei pi)"

    - lightnvm out-of-bounds fix (Colin)

    - SG allocation leak fix (Doug)

    - rnbd fixes (Gioh, Guoqing, Jack)

    - zone error translation fixes (Keith)

    - kerneldoc markup fix (Mauro)

    - zram lockdep fix (Peter)

    - Kill unused io_context members (Yufen)

    - NUMA memory allocation cleanup (Xianting)

    - NBD config wakeup fix (Xiubo)

    * tag 'block-5.10-2020-10-24' of git://git.kernel.dk/linux-block: (27 commits)
    block: blk-mq: fix a kernel-doc markup
    nvme-fc: shorten reconnect delay if possible for FC
    nvme-fc: wait for queues to freeze before calling update_hr_hw_queues
    nvme-fc: fix error loop in create_hw_io_queues
    nvme-fc: fix io timeout to abort I/O
    null_blk: use zone status for max active/open
    nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru
    nvmet: cleanup nvmet_passthru_map_sg()
    nvmet: limit passthru MTDS by BIO_MAX_PAGES
    nvmet: fix uninitialized work for zero kato
    nvme-pci: disable Write Zeroes on Sandisk Skyhawk
    nvme: use queuedata for nvme_req_qid
    nvme-rdma: fix crash due to incorrect cqe
    nvme-rdma: fix crash when connect rejected
    block: remove unused members for io_context
    blk-mq: remove the calling of local_memory_node()
    zram: Fix __zram_bvec_{read,write}() locking order
    skd_main: remove unused including
    sgl_alloc_order: fix memory leak
    lightnvm: fix out-of-bounds write to array devices->info[]
    ...

    Linus Torvalds
     

23 Oct, 2020

1 commit

  • The block layer provides special status codes when requests go beyond
    the zone resource limits. Use these codes instead of the generic IOERR
    for requests that exceed the max active or open limits the null_blk
    device was configured with so that applications know how these special
    conditions should be handled.

    Signed-off-by: Keith Busch
    Reviewed-by: Niklas Cassel
    Cc: Damien Le Moal
    Cc: Niklas Cassel
    Signed-off-by: Jens Axboe

    Keith Busch
     

22 Oct, 2020

1 commit

  • Pull ceph updates from Ilya Dryomov:

    - a patch that removes crush_workspace_mutex (myself). CRUSH
    computations are no longer serialized and can run in parallel.

    - a couple new filesystem client metrics for "ceph fs top" command
    (Xiubo Li)

    - a fix for a very old messenger bug that affected the filesystem,
    marked for stable (myself)

    - assorted fixups and cleanups throughout the codebase from Jeff and
    others.

    * tag 'ceph-for-5.10-rc1' of git://github.com/ceph/ceph-client: (27 commits)
    libceph: clear con->out_msg on Policy::stateful_server faults
    libceph: format ceph_entity_addr nonces as unsigned
    libceph: fix ENTITY_NAME format suggestion
    libceph: move a dout in queue_con_delay()
    ceph: comment cleanups and clarifications
    ceph: break up send_cap_msg
    ceph: drop separate mdsc argument from __send_cap
    ceph: promote to unsigned long long before shifting
    ceph: don't SetPageError on readpage errors
    ceph: mark ceph_fmt_xattr() as printf-like for better type checking
    ceph: fold ceph_update_writeable_page into ceph_write_begin
    ceph: fold ceph_sync_writepages into writepage_nounlock
    ceph: fold ceph_sync_readpages into ceph_readpage
    ceph: don't call ceph_update_writeable_page from page_mkwrite
    ceph: break out writeback of incompatible snap context to separate function
    ceph: add a note explaining session reject error string
    libceph: switch to the new "osd blocklist add" command
    libceph, rbd, ceph: "blacklist" -> "blocklist"
    ceph: have ceph_writepages_start call pagevec_lookup_range_tag
    ceph: use kill_anon_super helper
    ...

    Linus Torvalds
     

21 Oct, 2020

3 commits

  • Persistent grants feature provides high scalability. On some small
    systems, however, it could incur data copy overheads[1] and thus it is
    required to be disabled. It can be disabled from blkback side using a
    module parameter, 'feature_persistent'. But, it is impossible from
    blkfront side. For the reason, this commit adds a blkfront module
    parameter for disabling of the feature.

    [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

    Signed-off-by: SeongJae Park
    Reviewed-by: Juergen Gross
    Acked-by: Roger Pau Monné
    Link: https://lore.kernel.org/r/20200923061841.20531-3-sjpark@amazon.com
    Signed-off-by: Boris Ostrovsky

    SeongJae Park
     
  • Persistent grants feature provides high scalability. On some small
    systems, however, it could incur data copy overheads[1] and thus it is
    required to be disabled. But, there is no option to disable it. For
    the reason, this commit adds a module parameter for disabling of the
    feature.

    [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

    Signed-off-by: Anthony Liguori
    Signed-off-by: SeongJae Park
    Reviewed-by: Juergen Gross
    Acked-by: Roger Pau Monné
    Link: https://lore.kernel.org/r/20200923061841.20531-2-sjpark@amazon.com
    Signed-off-by: Boris Ostrovsky

    SeongJae Park
     
  • Pull more xen updates from Juergen Gross:

    - A single patch to fix the Xen security issue XSA-331 (malicious
    guests can DoS dom0 by triggering NULL-pointer dereferences or access
    to stale data).

    - A larger series to fix the Xen security issue XSA-332 (malicious
    guests can DoS dom0 by sending events at high frequency leading to
    dom0's vcpus being busy in IRQ handling for elongated times).

    * tag 'for-linus-5.10b-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/events: block rogue events for some time
    xen/events: defer eoi in case of excessive number of events
    xen/events: use a common cpu hotplug hook for event channels
    xen/events: switch user event channels to lateeoi model
    xen/pciback: use lateeoi irq binding
    xen/pvcallsback: use lateeoi irq binding
    xen/scsiback: use lateeoi irq binding
    xen/netback: use lateeoi irq binding
    xen/blkback: use lateeoi irq binding
    xen/events: add a new "late EOI" evtchn framework
    xen/events: fix race in evtchn_fifo_unmask()
    xen/events: add a proper barrier to 2-level uevent unmasking
    xen/events: avoid removing an event channel while handling it

    Linus Torvalds
     

20 Oct, 2020

1 commit

  • In order to reduce the chance for the system becoming unresponsive due
    to event storms triggered by a misbehaving blkfront use the lateeoi
    irq binding for blkback and unmask the event channel only after
    processing all pending requests.

    As the thread processing requests is used to do purging work in regular
    intervals an EOI may be sent only after having received an event. If
    there was no pending I/O request flag the EOI as spurious.

    This is part of XSA-332.

    Cc: stable@vger.kernel.org
    Reported-by: Julien Grall
    Signed-off-by: Juergen Gross
    Reviewed-by: Jan Beulich
    Reviewed-by: Wei Liu

    Juergen Gross
     

19 Oct, 2020

1 commit

  • Mikhail reported a lockdep spat detailing how __zram_bvec_read() and
    __zram_bvec_write() use zstrm->lock and zspage->lock in opposite order.

    Reported-by: Mikhail Gavrilov
    Signed-off-by: Peter Zijlstra (Intel)
    Tested-by: Mikhail Gavrilov
    Acked-by: Minchan Kim
    Acked-by: Sebastian Andrzej Siewior
    Signed-off-by: Jens Axboe

    Peter Zijlstra
     

17 Oct, 2020

3 commits

  • Remove including that don't need it.

    Signed-off-by: Tian Tao
    Signed-off-by: Jens Axboe

    Tian Tao
     
  • Merge more updates from Andrew Morton:
    "155 patches.

    Subsystems affected by this patch series: mm (dax, debug, thp,
    readahead, page-poison, util, memory-hotplug, zram, cleanups), misc,
    core-kernel, get_maintainer, MAINTAINERS, lib, bitops, checkpatch,
    binfmt, ramfs, autofs, nilfs, rapidio, panic, relay, kgdb, ubsan,
    romfs, and fault-injection"

    * emailed patches from Andrew Morton : (155 commits)
    lib, uaccess: add failure injection to usercopy functions
    lib, include/linux: add usercopy failure capability
    ROMFS: support inode blocks calculation
    ubsan: introduce CONFIG_UBSAN_LOCAL_BOUNDS for Clang
    sched.h: drop in_ubsan field when UBSAN is in trap mode
    scripts/gdb/tasks: add headers and improve spacing format
    scripts/gdb/proc: add struct mount & struct super_block addr in lx-mounts command
    kernel/relay.c: drop unneeded initialization
    panic: dump registers on panic_on_warn
    rapidio: fix the missed put_device() for rio_mport_add_riodev
    rapidio: fix error handling path
    nilfs2: fix some kernel-doc warnings for nilfs2
    autofs: harden ioctl table
    ramfs: fix nommu mmap with gaps in the page cache
    mm: remove the now-unnecessary mmget_still_valid() hack
    mm/gup: take mmap_lock in get_dump_page()
    binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot
    coredump: rework elf/elf_fdpic vma_dump_size() into common helper
    coredump: refactor page range dumping into common helper
    coredump: let dump_emit() bail out on short writes
    ...

    Linus Torvalds
     
  • If we fail to decompress in zram it's a pretty serious problem. We were
    entrusted to be able to decompress the old data but we failed. Either
    we've got some crazy bug in the compression code or we've got memory
    corruption.

    At the moment, when this happens the log looks like this:

    ERR kernel: [ 1833.099861] zram: Decompression failed! err=-22, page=336112
    ERR kernel: [ 1833.099881] zram: Decompression failed! err=-22, page=336112
    ALERT kernel: [ 1833.099886] Read-error on swap-device (253:0:2688896)

    It is true that we have an "ALERT" level log in there, but (at least to
    me) it feels like even this isn't enough to impart the seriousness of this
    error. Let's convert to a WARN_ON. Note that WARN_ON is automatically
    "unlikely" so we can simply replace the old annotation with the new one.

    Signed-off-by: Douglas Anderson
    Signed-off-by: Andrew Morton
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Sonny Rao
    Cc: Jens Axboe
    Link: https://lkml.kernel.org/r/20200917174059.1.If09c882545dbe432268f7a67a4d4cfcb6caace4f@changeid
    Signed-off-by: Linus Torvalds

    Douglas Anderson
     

16 Oct, 2020

1 commit

  • Pull networking updates from Jakub Kicinski:

    - Add redirect_neigh() BPF packet redirect helper, allowing to limit
    stack traversal in common container configs and improving TCP
    back-pressure.

    Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

    - Expand netlink policy support and improve policy export to user
    space. (Ge)netlink core performs request validation according to
    declared policies. Expand the expressiveness of those policies
    (min/max length and bitmasks). Allow dumping policies for particular
    commands. This is used for feature discovery by user space (instead
    of kernel version parsing or trial and error).

    - Support IGMPv3/MLDv2 multicast listener discovery protocols in
    bridge.

    - Allow more than 255 IPv4 multicast interfaces.

    - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
    packets of TCPv6.

    - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
    multiple subflows in a load balancing scenario. Enhance advertising
    addresses via the RM_ADDR/ADD_ADDR options.

    - Support SMC-Dv2 version of SMC, which enables multi-subnet
    deployments.

    - Allow more calls to same peer in RxRPC.

    - Support two new Controller Area Network (CAN) protocols - CAN-FD and
    ISO 15765-2:2016.

    - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
    kernel problem.

    - Add TC actions for implementing MPLS L2 VPNs.

    - Improve nexthop code - e.g. handle various corner cases when nexthop
    objects are removed from groups better, skip unnecessary
    notifications and make it easier to offload nexthops into HW by
    converting to a blocking notifier.

    - Support adding and consuming TCP header options by BPF programs,
    opening the doors for easy experimental and deployment-specific TCP
    option use.

    - Reorganize TCP congestion control (CC) initialization to simplify
    life of TCP CC implemented in BPF.

    - Add support for shipping BPF programs with the kernel and loading
    them early on boot via the User Mode Driver mechanism, hence reusing
    all the user space infra we have.

    - Support sleepable BPF programs, initially targeting LSM and tracing.

    - Add bpf_d_path() helper for returning full path for given 'struct
    path'.

    - Make bpf_tail_call compatible with bpf-to-bpf calls.

    - Allow BPF programs to call map_update_elem on sockmaps.

    - Add BPF Type Format (BTF) support for type and enum discovery, as
    well as support for using BTF within the kernel itself (current use
    is for pretty printing structures).

    - Support listing and getting information about bpf_links via the bpf
    syscall.

    - Enhance kernel interfaces around NIC firmware update. Allow
    specifying overwrite mask to control if settings etc. are reset
    during update; report expected max time operation may take to users;
    support firmware activation without machine reboot incl. limits of
    how much impact reset may have (e.g. dropping link or not).

    - Extend ethtool configuration interface to report IEEE-standard
    counters, to limit the need for per-vendor logic in user space.

    - Adopt or extend devlink use for debug, monitoring, fw update in many
    drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
    dpaa2-eth).

    - In mlxsw expose critical and emergency SFP module temperature alarms.
    Refactor port buffer handling to make the defaults more suitable and
    support setting these values explicitly via the DCBNL interface.

    - Add XDP support for Intel's igb driver.

    - Support offloading TC flower classification and filtering rules to
    mscc_ocelot switches.

    - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
    fixed interval period pulse generator and one-step timestamping in
    dpaa-eth.

    - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
    offload.

    - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
    this HW to use it. Convert mvpp2 to split PCS.

    - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
    7-port Mediatek MT7531 IP.

    - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
    and wcn3680 support in wcn36xx.

    - Improve performance for packets which don't require much offloads on
    recent Mellanox NICs by 20% by making multiple packets share a
    descriptor entry.

    - Move chelsio inline crypto drivers (for TLS and IPsec) from the
    crypto subtree to drivers/net. Move MDIO drivers out of the phy
    directory.

    - Clean up a lot of W=1 warnings, reportedly the actively developed
    subsections of networking drivers should now build W=1 warning free.

    - Make sure drivers don't use in_interrupt() to dynamically adapt their
    code. Convert tasklets to use new tasklet_setup API (sadly this
    conversion is not yet complete).

    * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
    Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
    net, sockmap: Don't call bpf_prog_put() on NULL pointer
    bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
    bpf, sockmap: Add locking annotations to iterator
    netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
    net: fix pos incrementment in ipv6_route_seq_next
    net/smc: fix invalid return code in smcd_new_buf_create()
    net/smc: fix valid DMBE buffer sizes
    net/smc: fix use-after-free of delayed events
    bpfilter: Fix build error with CONFIG_BPFILTER_UMH
    cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
    net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
    bpf: Fix register equivalence tracking.
    rxrpc: Fix loss of final ack on shutdown
    rxrpc: Fix bundle counting for exclusive connections
    netfilter: restore NF_INET_NUMHOOKS
    ibmveth: Identify ingress large send packets.
    ibmveth: Switch order of ibmveth_helper calls.
    cxgb4: handle 4-tuple PEDIT to NAT mode translation
    selftests: Add VRF route leaking tests
    ...

    Linus Torvalds
     

15 Oct, 2020

1 commit

  • There has one race case for ceph's rbd-nbd tool. When do mapping
    it may fail with EBUSY from ioctl(nbd, NBD_DO_IT), but actually
    the nbd device has already unmaped.

    It dues to if just after the wake_up(), the recv_work() is scheduled
    out and defers calling the nbd_config_put(), though the map process
    has exited the "nbd->recv_task" is not cleared.

    Signed-off-by: Xiubo Li
    Reviewed-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Xiubo Li
     

14 Oct, 2020

5 commits

  • After send_msg_open is done, send_msg_close should be done
    if any error occurs and it is necessary to recover
    what has been done.

    Signed-off-by: Gioh Kim
    Signed-off-by: Jack Wang
    Signed-off-by: Jens Axboe

    Gioh Kim
     
  • The max_hw_secotrs is only limited by the transport, not remote device,
    block layer on server side will split to the device limit if it's too
    big.

    The max_segments, similar, and rtrs server will submit single buffer, so
    no need to cap.

    Signed-off-by: Jack Wang
    Signed-off-by: Jens Axboe

    Jack Wang
     
  • The argument is not needed since all callers pass 1 for it.

    Signed-off-by: Guoqing Jiang
    Signed-off-by: Jack Wang
    Signed-off-by: Jens Axboe

    Guoqing Jiang
     
  • Pull block driver updates from Jens Axboe:
    "Here are the driver updates for 5.10.

    A few SCSI updates in here too, in coordination with Martin as they
    depend on core block changes for the shared tag bitmap.

    This contains:

    - NVMe pull requests via Christoph:
    - fix keep alive timer modification (Amit Engel)
    - order the PCI ID list more sensibly (Andy Shevchenko)
    - cleanup the open by controller helper (Chaitanya Kulkarni)
    - use an xarray for the CSE log lookup (Chaitanya Kulkarni)
    - support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
    - fix nvme_ns_report_zones (Christoph Hellwig)
    - add a sanity check to nvmet-fc (James Smart)
    - fix interrupt allocation when too many polled queues are
    specified (Jeffle Xu)
    - small nvmet-tcp optimization (Mark Wunderlich)
    - fix a controller refcount leak on init failure (Chaitanya
    Kulkarni)
    - misc cleanups (Chaitanya Kulkarni)
    - major refactoring of the scanning code (Christoph Hellwig)

    - MD updates via Song:
    - Bug fixes in bitmap code, from Zhao Heming
    - Fix a work queue check, from Guoqing Jiang
    - Fix raid5 oops with reshape, from Song Liu
    - Clean up unused code, from Jason Yan
    - Discard improvements, from Xiao Ni
    - raid5/6 page offset support, from Yufen Yu

    - Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
    Hannes)

    - null_blk open/active zone limit support (Niklas)

    - Set of bcache updates (Coly, Dongsheng, Qinglang)"

    * tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
    md/raid5: fix oops during stripe resizing
    md/bitmap: fix memory leak of temporary bitmap
    md: fix the checking of wrong work queue
    md/bitmap: md_bitmap_get_counter returns wrong blocks
    md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
    md/raid0: remove unused function is_io_in_chunk_boundary()
    nvme-core: remove extra condition for vwc
    nvme-core: remove extra variable
    nvme: remove nvme_identify_ns_list
    nvme: refactor nvme_validate_ns
    nvme: move nvme_validate_ns
    nvme: query namespace identifiers before adding the namespace
    nvme: revalidate zone bitmaps in nvme_update_ns_info
    nvme: remove nvme_update_formats
    nvme: update the known admin effects
    nvme: set the queue limits in nvme_update_ns_info
    nvme: remove the 0 lba_shift check in nvme_update_ns_info
    nvme: clean up the check for too large logic block sizes
    nvme: freeze the queue over ->lba_shift updates
    nvme: factor out a nvme_configure_metadata helper
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:

    - Series of merge handling cleanups (Baolin, Christoph)

    - Series of blk-throttle fixes and cleanups (Baolin)

    - Series cleaning up BDI, seperating the block device from the
    backing_dev_info (Christoph)

    - Removal of bdget() as a generic API (Christoph)

    - Removal of blkdev_get() as a generic API (Christoph)

    - Cleanup of is-partition checks (Christoph)

    - Series reworking disk revalidation (Christoph)

    - Series cleaning up bio flags (Christoph)

    - bio crypt fixes (Eric)

    - IO stats inflight tweak (Gabriel)

    - blk-mq tags fixes (Hannes)

    - Buffer invalidation fixes (Jan)

    - Allow soft limits for zone append (Johannes)

    - Shared tag set improvements (John, Kashyap)

    - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

    - DM no-wait support (Mike, Konstantin)

    - Request allocation improvements (Ming)

    - Allow md/dm/bcache to use IO stat helpers (Song)

    - Series improving blk-iocost (Tejun)

    - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
    Xianting, Yang, Yufen, yangerkun)

    * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
    block: fix uapi blkzoned.h comments
    blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
    blk-mq: get rid of the dead flush handle code path
    block: get rid of unnecessary local variable
    block: fix comment and add lockdep assert
    blk-mq: use helper function to test hw stopped
    block: use helper function to test queue register
    block: remove redundant mq check
    block: invoke blk_mq_exit_sched no matter whether have .exit_sched
    percpu_ref: don't refer to ref->data if it isn't allocated
    block: ratelimit handle_bad_sector() message
    blk-throttle: Re-use the throtl_set_slice_end()
    blk-throttle: Open code __throtl_de/enqueue_tg()
    blk-throttle: Move service tree validation out of the throtl_rb_first()
    blk-throttle: Move the list operation after list validation
    blk-throttle: Fix IO hang for a corner case
    blk-throttle: Avoid tracking latency if low limit is invalid
    blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
    blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
    block: Remove redundant 'return' statement
    ...

    Linus Torvalds
     

12 Oct, 2020

1 commit


06 Oct, 2020

1 commit