13 Feb, 2021

1 commit

  • The ptrace(PTRACE_PEEKMTETAGS) implementation checks whether the user
    page has valid tags (mapped with PROT_MTE) by testing the PG_mte_tagged
    page flag. If this bit is cleared, ptrace(PTRACE_PEEKMTETAGS) returns
    -EIO.

    A newly created (PROT_MTE) mapping points to the zero page which had its
    tags zeroed during cpu_enable_mte(). If there were no prior writes to
    this mapping, ptrace(PTRACE_PEEKMTETAGS) fails with -EIO since the zero
    page does not have the PG_mte_tagged flag set.

    Set PG_mte_tagged on the zero page when its tags are cleared during
    boot. In addition, to avoid ptrace(PTRACE_PEEKMTETAGS) succeeding on
    !PROT_MTE mappings pointing to the zero page, change the
    __access_remote_tags() check to (vm_flags & VM_MTE) instead of
    PG_mte_tagged.

    Signed-off-by: Catalin Marinas
    Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
    Cc: # 5.10.x
    Cc: Will Deacon
    Reported-by: Luis Machado
    Tested-by: Luis Machado
    Reviewed-by: Vincenzo Frascino
    Link: https://lore.kernel.org/r/20210210180316.23654-1-catalin.marinas@arm.com

    Catalin Marinas
     

03 Feb, 2021

2 commits

  • Because of the tagged addresses, the __is_lm_address() and
    __lm_to_phys() macros grew to some harder to understand bitwise
    operations using PAGE_OFFSET. Since these macros only accept untagged
    addresses, use a simple subtract operation.

    Signed-off-by: Catalin Marinas
    Acked-by: Ard Biesheuvel
    Cc: Will Deacon
    Cc: Mark Rutland
    Link: https://lore.kernel.org/r/20210201190634.22942-3-catalin.marinas@arm.com

    Catalin Marinas
     
  • Commit 519ea6f1c82f ("arm64: Fix kernel address detection of
    __is_lm_address()") fixed the incorrect validation of addresses below
    PAGE_OFFSET. However, it no longer allowed tagged addresses to be passed
    to virt_addr_valid().

    Fix this by explicitly resetting the pointer tag prior to invoking
    __is_lm_address(). This is consistent with the __lm_to_phys() macro.

    Fixes: 519ea6f1c82f ("arm64: Fix kernel address detection of __is_lm_address()")
    Signed-off-by: Catalin Marinas
    Acked-by: Ard Biesheuvel
    Cc: # 5.4.x
    Cc: Will Deacon
    Cc: Vincenzo Frascino
    Cc: Mark Rutland
    Link: https://lore.kernel.org/r/20210201190634.22942-2-catalin.marinas@arm.com

    Catalin Marinas
     

27 Jan, 2021

2 commits

  • Address issue observed on real world system with suboptimal IORT table
    where DMA masks of PCI devices would get set to 0 as result.

    iort_dma_setup() would query the root complex'/named component IORT
    entry for a DMA mask, and use that over the one the device has been
    configured with earlier.

    Ideally we want to use the minimum mask of what the IORT contains for
    the root complex and what the device was configured with.

    Fixes: 5ac65e8c8941 ("ACPI/IORT: Support address size limit for root complexes")
    Signed-off-by: Moritz Fischer
    Reviewed-by: Robin Murphy
    Acked-by: Lorenzo Pieralisi
    Link: https://lore.kernel.org/r/20210122012419.95010-1-mdf@kernel.org
    Signed-off-by: Catalin Marinas

    Moritz Fischer
     
  • Currently, the __is_lm_address() check just masks out the top 12 bits
    of the address, but if they are 0, it still yields a true result.
    This has as a side effect that virt_addr_valid() returns true even for
    invalid virtual addresses (e.g. 0x0).

    Fix the detection checking that it's actually a kernel address starting
    at PAGE_OFFSET.

    Fixes: 68dd8ef32162 ("arm64: memory: Fix virt_addr_valid() using __is_lm_address()")
    Cc: # 5.4.x
    Cc: Will Deacon
    Suggested-by: Catalin Marinas
    Reviewed-by: Catalin Marinas
    Acked-by: Mark Rutland
    Signed-off-by: Vincenzo Frascino
    Link: https://lore.kernel.org/r/20210126134056.45747-1-vincenzo.frascino@arm.com
    Signed-off-by: Catalin Marinas

    Vincenzo Frascino
     

23 Jan, 2021

1 commit

  • I was hitting the below panic continuously when attaching kprobes to
    scheduler functions

    [ 159.045212] Unexpected kernel BRK exception at EL1
    [ 159.053753] Internal error: BRK handler: f2000006 [#1] PREEMPT SMP
    [ 159.059954] Modules linked in:
    [ 159.063025] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.11.0-rc4-00008-g1e2a199f6ccd #56
    [rt-app] [1] Exiting.[ 159.071166] Hardware name: ARM Juno development board (r2) (DT)
    [ 159.079689] pstate: 600003c5 (nZCv DAIF -PAN -UAO -TCO BTYPE=--)

    [ 159.085723] pc : 0xffff80001624501c
    [ 159.089377] lr : attach_entity_load_avg+0x2ac/0x350
    [ 159.094271] sp : ffff80001622b640
    [rt-app] [0] Exiting.[ 159.097591] x29: ffff80001622b640 x28: 0000000000000001
    [ 159.105515] x27: 0000000000000049 x26: ffff000800b79980

    [ 159.110847] x25: ffff00097ef37840 x24: 0000000000000000
    [ 159.116331] x23: 00000024eacec1ec x22: ffff00097ef12b90
    [ 159.121663] x21: ffff00097ef37700 x20: ffff800010119170
    [rt-app] [11] Exiting.[ 159.126995] x19: ffff00097ef37840 x18: 000000000000000e
    [ 159.135003] x17: 0000000000000001 x16: 0000000000000019
    [ 159.140335] x15: 0000000000000000 x14: 0000000000000000
    [ 159.145666] x13: 0000000000000002 x12: 0000000000000002
    [ 159.150996] x11: ffff80001592f9f0 x10: 0000000000000060
    [ 159.156327] x9 : ffff8000100f6f9c x8 : be618290de0999a1
    [ 159.161659] x7 : ffff80096a4b1000 x6 : 0000000000000000
    [ 159.166990] x5 : ffff00097ef37840 x4 : 0000000000000000
    [ 159.172321] x3 : ffff000800328948 x2 : 0000000000000000
    [ 159.177652] x1 : 0000002507d52fec x0 : ffff00097ef12b90
    [ 159.182983] Call trace:
    [ 159.185433] 0xffff80001624501c
    [ 159.188581] update_load_avg+0x2d0/0x778
    [ 159.192516] enqueue_task_fair+0x134/0xe20
    [ 159.196625] enqueue_task+0x4c/0x2c8
    [ 159.200211] ttwu_do_activate+0x70/0x138
    [ 159.204147] sched_ttwu_pending+0xbc/0x160
    [ 159.208253] flush_smp_call_function_queue+0x16c/0x320
    [ 159.213408] generic_smp_call_function_single_interrupt+0x1c/0x28
    [ 159.219521] ipi_handler+0x1e8/0x3c8
    [ 159.223106] handle_percpu_devid_irq+0xd8/0x460
    [ 159.227650] generic_handle_irq+0x38/0x50
    [ 159.231672] __handle_domain_irq+0x6c/0xc8
    [ 159.235781] gic_handle_irq+0xcc/0xf0
    [ 159.239452] el1_irq+0xb4/0x180
    [ 159.242600] rcu_is_watching+0x28/0x70
    [ 159.246359] rcu_read_lock_held_common+0x44/0x88
    [ 159.250991] rcu_read_lock_any_held+0x30/0xc0
    [ 159.255360] kretprobe_dispatcher+0xc4/0xf0
    [ 159.259555] __kretprobe_trampoline_handler+0xc0/0x150
    [ 159.264710] trampoline_probe_handler+0x38/0x58
    [ 159.269255] kretprobe_trampoline+0x70/0xc4
    [ 159.273450] run_rebalance_domains+0x54/0x80
    [ 159.277734] __do_softirq+0x164/0x684
    [ 159.281406] irq_exit+0x198/0x1b8
    [ 159.284731] __handle_domain_irq+0x70/0xc8
    [ 159.288840] gic_handle_irq+0xb0/0xf0
    [ 159.292510] el1_irq+0xb4/0x180
    [ 159.295658] arch_cpu_idle+0x18/0x28
    [ 159.299245] default_idle_call+0x9c/0x3e8
    [ 159.303265] do_idle+0x25c/0x2a8
    [ 159.306502] cpu_startup_entry+0x2c/0x78
    [ 159.310436] secondary_start_kernel+0x160/0x198
    [ 159.314984] Code: d42000c0 aa1e03e9 d42000c0 aa1e03e9 (d42000c0)

    After a bit of head scratching and debugging it turned out that it is
    due to kprobe handler being interrupted by a tick that causes us to go
    into (I think another) kprobe handler.

    The culprit was kprobe_breakpoint_ss_handler() returning DBG_HOOK_ERROR
    which leads to the Unexpected kernel BRK exception.

    Reverting commit ba090f9cafd5 ("arm64: kprobes: Remove redundant
    kprobe_step_ctx") seemed to fix the problem for me.

    Further analysis showed that kcb->kprobe_status is set to
    KPROBE_REENTER when the error occurs. By teaching
    kprobe_breakpoint_ss_handler() to handle this status I can no longer
    reproduce the problem.

    Fixes: ba090f9cafd5 ("arm64: kprobes: Remove redundant kprobe_step_ctx")
    Signed-off-by: Qais Yousef
    Acked-by: Will Deacon
    Acked-by: Masami Hiramatsu
    Link: https://lore.kernel.org/r/20210122110909.3324607-1-qais.yousef@arm.com
    Signed-off-by: Catalin Marinas

    Qais Yousef
     

19 Jan, 2021

1 commit

  • As of the "arm64: expose FAR_EL1 tag bits in siginfo" patch, the address
    that is passed to report_tag_fault has pointer tags in the format of 0x0X,
    while KASAN uses 0xFX format (note the difference in the top 4 bits).

    Fix up the pointer tag for kernel pointers in do_tag_check_fault by
    setting them to the same value as bit 55. Explicitly use __untagged_addr()
    instead of untagged_addr(), as the latter doesn't affect TTBR1 addresses.

    Fixes: dceec3ff7807 ("arm64: expose FAR_EL1 tag bits in siginfo")
    Fixes: 4291e9ee6189 ("kasan, arm64: print report from tag fault handler")
    Signed-off-by: Andrey Konovalov
    Reviewed-by: Catalin Marinas
    Reviewed-by: Vincenzo Frascino
    Link: https://linux-review.googlesource.com/id/I9ced973866036d8679e8f4ae325de547eb969649
    Link: https://lore.kernel.org/r/ff30b0afe6005fd046f9ac72bfb71822aedccd89.1610731872.git.andreyknvl@google.com
    Signed-off-by: Catalin Marinas

    Andrey Konovalov
     

15 Jan, 2021

3 commits

  • The SVE and FPSIMD stress tests have a spelling mistake in the output, fix
    it.

    Signed-off-by: Mark Brown
    Link: https://lore.kernel.org/r/20210108183144.673-1-broonie@kernel.org
    Signed-off-by: Catalin Marinas

    Mark Brown
     
  • The kbuild test robot reports that when building with W=1, GCC will warn
    for a couple of missing prototypes in syscall.c:

    | arch/arm64/kernel/syscall.c:157:6: warning: no previous prototype for 'do_el0_svc' [-Wmissing-prototypes]
    | 157 | void do_el0_svc(struct pt_regs *regs)
    | | ^~~~~~~~~~
    | arch/arm64/kernel/syscall.c:164:6: warning: no previous prototype for 'do_el0_svc_compat' [-Wmissing-prototypes]
    | 164 | void do_el0_svc_compat(struct pt_regs *regs)
    | | ^~~~~~~~~~~~~~~~~

    While this isn't a functional problem, as a general policy we should
    include the prototype for functions wherever possible to catch any
    accidental divergence between the prototype and implementation. Here we
    can easily include , so let's do so.

    While there are a number of warnings elsewhere and some warnings enabled
    under W=1 are of questionable benefit, this change helps to make the
    code more robust as it evolved and reduces the noise somewhat, so it
    seems worthwhile.

    Signed-off-by: Mark Rutland
    Reported-by: kernel test robot
    Cc: Will Deacon
    Link: https://lore.kernel.org/r/202101141046.n8iPO3mw-lkp@intel.com
    Link: https://lore.kernel.org/r/20210114124812.17754-1-mark.rutland@arm.com
    Signed-off-by: Catalin Marinas

    Mark Rutland
     
  • GCC versions >= 4.9 and < 5.1 have been shown to emit memory references
    beyond the stack pointer, resulting in memory corruption if an interrupt
    is taken after the stack pointer has been adjusted but before the
    reference has been executed. This leads to subtle, infrequent data
    corruption such as the EXT4 problems reported by Russell King at the
    link below.

    Life is too short for buggy compilers, so raise the minimum GCC version
    required by arm64 to 5.1.

    Reported-by: Russell King
    Suggested-by: Arnd Bergmann
    Signed-off-by: Will Deacon
    Tested-by: Nathan Chancellor
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Nathan Chancellor
    Acked-by: Linus Torvalds
    Cc:
    Cc: Theodore Ts'o
    Cc: Florian Weimer
    Cc: Peter Zijlstra
    Cc: Nick Desaulniers
    Link: https://lore.kernel.org/r/20210105154726.GD1551@shell.armlinux.org.uk
    Link: https://lore.kernel.org/r/20210112224832.10980-1-will@kernel.org
    Signed-off-by: Catalin Marinas

    Will Deacon
     

13 Jan, 2021

5 commits

  • With UBSAN enabled and building with clang, there are occasionally
    warnings like

    WARNING: modpost: vmlinux.o(.text+0xc533ec): Section mismatch in reference from the function arch_atomic64_or() to the variable .init.data:numa_nodes_parsed
    The function arch_atomic64_or() references
    the variable __initdata numa_nodes_parsed.
    This is often because arch_atomic64_or lacks a __initdata
    annotation or the annotation of numa_nodes_parsed is wrong.

    for functions that end up not being inlined as intended but operating
    on __initdata variables. Mark these as __always_inline, along with
    the corresponding asm-generic wrappers.

    Signed-off-by: Arnd Bergmann
    Acked-by: Will Deacon
    Link: https://lore.kernel.org/r/20210108092024.4034860-1-arnd@kernel.org
    Signed-off-by: Catalin Marinas

    Arnd Bergmann
     
  • S_FRAME_SIZE is the size of the pt_regs structure, no longer the size of
    the kernel stack frame, the name is misleading. In keeping with arm32,
    rename S_FRAME_SIZE to PT_REGS_SIZE.

    Signed-off-by: Jianlin Lv
    Acked-by: Mark Rutland
    Link: https://lore.kernel.org/r/20210112015813.2340969-1-Jianlin.Lv@arm.com
    Signed-off-by: Catalin Marinas

    Jianlin Lv
     
  • This reverts commit 367c820ef08082e68df8a3bc12e62393af21e4b5.

    lockup_detector_init() makes heavy use of per-cpu variables and must be
    called with preemption disabled. Usually, it's handled early during boot
    in kernel_init_freeable(), before SMP has been initialised.

    Since we do not know whether or not our PMU interrupt can be signalled
    as an NMI until considerably later in the boot process, the Arm PMU
    driver attempts to re-initialise the lockup detector off the back of a
    device_initcall(). Unfortunately, this is called from preemptible
    context and results in the following splat:

    | BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
    | caller is debug_smp_processor_id+0x20/0x2c
    | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
    | Hardware name: linux,dummy-virt (DT)
    | Call trace:
    | dump_backtrace+0x0/0x3c0
    | show_stack+0x20/0x6c
    | dump_stack+0x2f0/0x42c
    | check_preemption_disabled+0x1cc/0x1dc
    | debug_smp_processor_id+0x20/0x2c
    | hardlockup_detector_event_create+0x34/0x18c
    | hardlockup_detector_perf_init+0x2c/0x134
    | watchdog_nmi_probe+0x18/0x24
    | lockup_detector_init+0x44/0xa8
    | armv8_pmu_driver_init+0x54/0x78
    | do_one_initcall+0x184/0x43c
    | kernel_init_freeable+0x368/0x380
    | kernel_init+0x1c/0x1cc
    | ret_from_fork+0x10/0x30

    Rather than bodge this with raw_smp_processor_id() or randomly disabling
    preemption, simply revert the culprit for now until we figure out how to
    do this properly.

    Reported-by: Lecopzer Chen
    Signed-off-by: Will Deacon
    Acked-by: Mark Rutland
    Cc: Sumit Garg
    Cc: Alexandru Elisei
    Link: https://lore.kernel.org/r/20201221162249.3119-1-lecopzer.chen@mediatek.com
    Link: https://lore.kernel.org/r/20210112221855.10666-1-will@kernel.org
    Signed-off-by: Catalin Marinas

    Will Deacon
     
  • All EL0 returns go via ret_to_user(), which masks IRQs and notifies
    lockdep and tracing before calling into do_notify_resume(). Therefore,
    there's no need for do_notify_resume() to call trace_hardirqs_off(), and
    the comment is stale. The call is simply redundant.

    In ret_to_user() we call exit_to_user_mode(), which notifies lockdep and
    tracing the IRQs will be enabled in userspace, so there's no need for
    el0_svc_common() to call trace_hardirqs_on() before returning. Further,
    at the start of ret_to_user() we call trace_hardirqs_off(), so not only
    is this redundant, but it is immediately undone.

    In addition to being redundant, the trace_hardirqs_on() in
    el0_svc_common() leaves lockdep inconsistent with the hardware state,
    and is liable to cause issues for any C code or instrumentation
    between this and the call to trace_hardirqs_off() which undoes it in
    ret_to_user().

    This patch removes the redundant tracing calls and associated stale
    comments.

    Fixes: 23529049c684 ("arm64: entry: fix non-NMI userkernel transitions")
    Signed-off-by: Mark Rutland
    Acked-by: Will Deacon
    Cc: James Morse
    Cc: Will Deacon
    Link: https://lore.kernel.org/r/20210107145310.44616-1-mark.rutland@arm.com
    Signed-off-by: Catalin Marinas

    Mark Rutland
     
  • With the introduction of a dynamic ZONE_DMA range based on DT or IORT
    information, there's no need for CMA allocations from the wider
    ZONE_DMA32 since on most platforms ZONE_DMA will cover the 32-bit
    addressable range. Remove the arm64_dma32_phys_limit and set
    arm64_dma_phys_limit to cover the smallest DMA range required on the
    platform. CMA allocation and crashkernel reservation now go in the
    dynamically sized ZONE_DMA, allowing correct functionality on RPi4.

    Signed-off-by: Catalin Marinas
    Cc: Chen Zhou
    Reviewed-by: Nicolas Saenz Julienne
    Tested-by: Nicolas Saenz Julienne # On RPi4B

    Catalin Marinas
     

11 Jan, 2021

12 commits

  • Linus Torvalds
     
  • …masahiroy/linux-kbuild

    Pull Kbuild fixes from Masahiro Yamada:

    - Search for <ncurses.h> in the default header path of HOSTCC

    - Tweak the option order to be kind to old BSD awk

    - Remove 'kvmconfig' and 'xenconfig' shorthands

    - Fix documentation

    * tag 'kbuild-fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    Documentation: kbuild: Fix section reference
    kconfig: remove 'kvmconfig' and 'xenconfig' shorthands
    lib/raid6: Let $(UNROLL) rules work with macOS userland
    kconfig: Support building mconf with vendor sysroot ncurses
    kconfig: config script: add a little user help
    MAINTAINERS: adjust GCC PLUGINS after gcc-plugin.sh removal

    Linus Torvalds
     
  • Pull SCSI fixes from James Bottomley:
    "This is two driver fixes (megaraid_sas and hisi_sas).

    The megaraid one is a revert of a previous revert of a cpu hotplug fix
    which exposed a bug in the block layer which has been fixed in this
    merge window.

    The hisi_sas performance enhancement comes from switching to interrupt
    managed completion queues, which depended on the addition of
    devm_platform_get_irqs_affinity() which is now upstream via the irq
    tree in the last merge window"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: hisi_sas: Expose HW queues for v2 hw
    Revert "Revert "scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug""

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:

    - Missing CRC32 selections (Arnd)

    - Fix for a merge window regression with bdev inode init (Christoph)

    - bcache fixes

    - rnbd fixes

    - NVMe pull request from Christoph:
    - fix a race in the nvme-tcp send code (Sagi Grimberg)
    - fix a list corruption in an nvme-rdma error path (Israel Rukshin)
    - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
    - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
    - fix two compiler warnings in nvme-fcloop (James Smart)
    - don't call sleeping functions from irq context in nvme-fc (James Smart)
    - remove an unused argument (Max Gurtovoy)
    - remove unused exports (Minwoo Im)

    - Use-after-free fix for partition iteration (Ming)

    - Missing blk-mq debugfs flag annotation (John)

    - Bdev freeze regression fix (Satya)

    - blk-iocost NULL pointer deref fix (Tejun)

    * tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block: (26 commits)
    bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
    bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
    bcache: check unsupported feature sets for bcache register
    bcache: fix typo from SUUP to SUPP in features.h
    bcache: set pdev_set_uuid before scond loop iteration
    blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED
    block/rnbd-clt: avoid module unload race with close confirmation
    block/rnbd: Adding name to the Contributors List
    block/rnbd-clt: Fix sg table use after free
    block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close
    block/rnbd: Select SG_POOL for RNBD_CLIENT
    block: pre-initialize struct block_device in bdev_alloc_inode
    fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb
    nvme: remove the unused status argument from nvme_trace_bio_complete
    nvmet-rdma: Fix list_del corruption on queue establishment failure
    nvme: unexport functions with no external caller
    nvme: avoid possible double fetch in handling CQE
    nvme-tcp: Fix possible race of io_work and direct send
    nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
    nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
    ...

    Linus Torvalds
     
  • Pull io_uring fixes from Jens Axboe:
    "A bit larger than I had hoped at this point, but it's all changes that
    will be directed towards stable anyway. In detail:

    - Fix a merge window regression on error return (Matthew)

    - Remove useless variable declaration/assignment (Ye Bin)

    - IOPOLL fixes (Pavel)

    - Exit and cancelation fixes (Pavel)

    - fasync lockdep complaint fix (Pavel)

    - Ensure SQPOLL is synchronized with creator life time (Pavel)"

    * tag 'io_uring-5.11-2021-01-10' of git://git.kernel.dk/linux-block:
    io_uring: stop SQPOLL submit on creator's death
    io_uring: add warn_once for io_uring_flush()
    io_uring: inline io_uring_attempt_task_drop()
    io_uring: io_rw_reissue lockdep annotations
    io_uring: synchronise ev_posted() with waitqueues
    io_uring: dont kill fasync under completion_lock
    io_uring: trigger eventfd for IOPOLL
    io_uring: Fix return value from alloc_fixed_file_ref_node
    io_uring: Delete useless variable ‘id’ in io_prep_async_work
    io_uring: cancel more aggressively in exit_work
    io_uring: drop file refs after task cancel
    io_uring: patch up IOPOLL overflow_flush sync
    io_uring: synchronise IOPOLL on task_submit fail

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are a number of small USB driver fixes for 5.11-rc3.

    Include in here are:

    - USB gadget driver fixes for reported issues

    - new usb-serial driver ids

    - dma from stack bugfixes

    - typec bugfixes

    - dwc3 bugfixes

    - xhci driver bugfixes

    - other small misc usb driver bugfixes

    All of these have been in linux-next with no reported issues"

    * tag 'usb-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (35 commits)
    usb: dwc3: gadget: Clear wait flag on dequeue
    usb: typec: Send uevent for num_altmodes update
    usb: typec: Fix copy paste error for NVIDIA alt-mode description
    usb: gadget: enable super speed plus
    kcov, usb: hide in_serving_softirq checks in __usb_hcd_giveback_urb
    usb: uas: Add PNY USB Portable SSD to unusual_uas
    usb: gadget: configfs: Preserve function ordering after bind failure
    usb: gadget: select CONFIG_CRC32
    usb: gadget: core: change the comment for usb_gadget_connect
    usb: gadget: configfs: Fix use-after-free issue with udc_name
    usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup
    usb: usbip: vhci_hcd: protect shift size
    USB: usblp: fix DMA to stack
    USB: serial: iuu_phoenix: fix DMA from stack
    USB: serial: option: add LongSung M5710 module support
    USB: serial: option: add Quectel EM160R-GL
    USB: Gadget: dummy-hcd: Fix shift-out-of-bounds bug
    usb: gadget: f_uac2: reset wMaxPacketSize
    usb: dwc3: ulpi: Fix USB2.0 HS/FS/LS PHY suspend regression
    usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one
    ...

    Linus Torvalds
     
  • Pull staging driver fixes from Greg KH:
    "Here are some small staging driver fixes for 5.11-rc3. Nothing major,
    just resolving some reported issues:

    - cleanup some remaining mentions of the ION drivers that were
    removed in 5.11-rc1

    - comedi driver bugfix

    - two error path memory leak fixes

    All have been in linux-next for a while with no reported issues"

    * tag 'staging-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: ION: remove some references to CONFIG_ION
    staging: mt7621-dma: Fix a resource leak in an error handling path
    Staging: comedi: Return -EFAULT if copy_to_user() fails
    staging: spmi: hisi-spmi-controller: Fix some error handling paths

    Linus Torvalds
     
  • Pull char/misc driver fixes from Greg KH:
    "Here are some small char and misc driver fixes for 5.11-rc3.

    The majority here are fixes for the habanalabs drivers, but also in
    here are:

    - crypto driver fix

    - pvpanic driver fix

    - updated font file

    - interconnect driver fixes

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'char-misc-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (26 commits)
    Fonts: font_ter16x32: Update font with new upstream Terminus release
    misc: pvpanic: Check devm_ioport_map() for NULL
    speakup: Add github repository URL and bug tracker
    MAINTAINERS: Update Georgi's email address
    crypto: asym_tpm: correct zero out potential secrets
    habanalabs: Fix memleak in hl_device_reset
    interconnect: imx8mq: Use icc_sync_state
    interconnect: imx: Remove a useless test
    interconnect: imx: Add a missing of_node_put after of_device_is_available
    interconnect: qcom: fix rpmh link failures
    habanalabs: fix order of status check
    habanalabs: register to pci shutdown callback
    habanalabs: add validation cs counter, fix misplaced counters
    habanalabs/gaudi: retry loading TPC f/w on -EINTR
    habanalabs: adjust pci controller init to new firmware
    habanalabs: update comment in hl_boot_if.h
    habanalabs/gaudi: enhance reset message
    habanalabs: full FW hard reset support
    habanalabs/gaudi: disable CGM at HW initialization
    habanalabs: Revise comment to align with mirror list name
    ...

    Linus Torvalds
     
  • Section 3.11 was incorrectly called 3.9, fix it.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Masahiro Yamada

    Viresh Kumar
     
  • Pull ARC fixes from Vineet Gupta:

    - Address the 2nd boot failure due to snafu in signal handling code
    (first was generic console ttynull issue)

    - misc other fixes

    * tag 'arc-5.11-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    ARC: [hsdk]: Enable FPU_SAVE_RESTORE
    ARC: unbork 5.11 bootup: fix snafu in _TIF_NOTIFY_SIGNAL handling
    include/soc: remove headers for EZChip NPS
    arch/arc: add copy_user_page() to to fix build error on ARC

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:

    - A fix for machine check handling with VMAP stack on 32-bit.

    - A clang build fix.

    Thanks to Christophe Leroy and Nathan Chancellor.

    * tag 'powerpc-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc: Handle .text.{hot,unlikely}.* in linker script
    powerpc/32s: Fix RTAS machine check with VMAP stack

    Linus Torvalds
     
  • Pull x86 fixes from Borislav Petkov:
    "As expected, fixes started trickling in after the holidays so here is
    the accumulated pile of x86 fixes for 5.11:

    - A fix for fanotify_mark() missing the conversion of x86_32 native
    syscalls which take 64-bit arguments to the compat handlers due to
    former having a general compat handler. (Brian Gerst)

    - Add a forgotten pmd page destructor call to pud_free_pmd_page()
    where a pmd page is freed. (Dan Williams)

    - Make IN/OUT insns with an u8 immediate port operand handling for
    SEV-ES guests more precise by using only the single port byte and
    not the whole s32 value of the insn decoder. (Peter Gonda)

    - Correct a straddling end range check before returning the proper
    MTRR type, when the end address is the same as top of memory.
    (Ying-Tsun Huang)

    - Change PQR_ASSOC MSR update scheme when moving a task to a resctrl
    resource group to avoid significant performance overhead with some
    resctrl workloads. (Fenghua Yu)

    - Avoid the actual task move overhead when the task is already in the
    resource group. (Fenghua Yu)"

    * tag 'x86_urgent_for_v5.11_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/resctrl: Don't move a task to the same resource group
    x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
    x86/mtrr: Correct the range check before performing MTRR type lookups
    x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
    x86/mm: Fix leak of pmd ptlock
    fanotify: Fix sys_fanotify_mark() on native x86-32

    Linus Torvalds
     

10 Jan, 2021

13 commits

  • …/groeck/linux-staging

    Pull hwmon fixes from Guenter Roeck:

    - Fix possible KASAN issue in amd_energy driver

    - Avoid configuration problem in pwm-fan driver

    - Fix kernel-doc warning in sbtsi_temp documentation

    * tag 'hwmon-for-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
    hwmon: (amd_energy) fix allocation of hwmon_channel_info config
    hwmon: (pwm-fan) Ensure that calculation doesn't discard big period values
    hwmon: (sbtsi_temp) Fix Documenation kernel-doc warning

    Linus Torvalds
     
  • Pull dmaengine fixes from Vinod Koul:
    "A bunch of dmaengine driver fixes for:

    - coverity discovered issues for xilinx driver

    - qcom, gpi driver fix for undefined bhaviour and one off cleanup

    - update Peter's email for TI DMA drivers

    - one-off for idxd driver

    - resource leak fix for mediatek and milbeaut drivers"

    * tag 'dmaengine-fix-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
    dmaengine: stm32-mdma: fix STM32_MDMA_VERY_HIGH_PRIORITY value
    dmaengine: xilinx_dma: fix mixed_enum_type coverity warning
    dmaengine: xilinx_dma: fix incompatible param warning in _child_probe()
    dmaengine: xilinx_dma: check dma_async_device_register return value
    dmaengine: qcom: fix gpi undefined behavior
    dt-bindings: dma: ti: Update maintainer and author information
    MAINTAINERS: Add entry for Texas Instruments DMA drivers
    qcom: bam_dma: Delete useless kfree code
    dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk()
    dmaengine: milbeaut-xdmac: Fix a resource leak in the error handling path of the probe function
    dmaengine: mediatek: mtk-hsdma: Fix a resource leak in the error handling path of the probe function
    dmaengine: qcom: gpi: Fixes a format mismatch
    dmaengine: idxd: off by one in cleanup code
    dmaengine: ti: k3-udma: Fix pktdma rchan TPL level setup

    Linus Torvalds
     
  • Pull i2c fixes from Wolfram Sang:
    "Three driver bugfixes for I2C. Buisness as usual"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: mediatek: Fix apdma and i2c hand-shake timeout
    i2c: i801: Fix the i2c-mux gpiod_lookup_table not being properly terminated
    i2c: sprd: use a specific timeout to avoid system hang up issue

    Linus Torvalds
     
  • Change my email contact ahead of a likely painful eleven-month migration
    to a certain cobalt enteprisey groupware cloud product that will totally
    break my workflow. Some day I may get used to having to email being
    sequestered behind both claret and cerulean oath2+sms 2fa layers, but
    for now I'll stick with keying in one password to receive an email vs.
    the required four.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
    its internals like ->files and ->mm to be poked by the SQPOLL task, it
    have never been nice and recently got racy. That can happen when the
    owner undergoes destruction and SQPOLL tasks tries to submit new
    requests in parallel, and so calls io_sq_thread_acquire*().

    That patch halts SQPOLL submissions when sqo_task dies by introducing
    sqo_dead flag. Once set, the SQPOLL task must not do any submission,
    which is synchronised by uring_lock as well as the new flag.

    The tricky part is to make sure that disabling always happens, that
    means either the ring is discovered by creator's do_exit() -> cancel,
    or if the final close() happens before it's done by the creator. The
    last is guaranteed by the fact that for SQPOLL the creator task and only
    it holds exactly one file note, so either it pins up to do_exit() or
    removed by the creator on the final put in flush. (see comments in
    uring_flush() around file->f_count == 2).

    One more place that can trigger io_sq_thread_acquire_*() is
    __io_req_task_submit(). Shoot off requests on sqo_dead there, even
    though actually we don't need to. That's because cancellation of
    sqo_task should wait for the request before going any further.

    note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
    caller would enter the ring to get an error, but it still doesn't
    guarantee that the flag won't be cleared.

    note 2: if final __userspace__ close happens not from the creator
    task, the file note will pin the ring until the task dies.

    Fixed: b1b6b5a30dce8 ("kernel/io_uring: cancel io_uring before task works")
    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     
  • files_cancel() should cancel all relevant requests and drop file notes,
    so we should never have file notes after that, including on-exit fput
    and flush. Add a WARN_ONCE to be sure.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     
  • A simple preparation change inlining io_uring_attempt_task_drop() into
    io_uring_flush().

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     
  • We expect io_rw_reissue() to take place only during submission with
    uring_lock held. Add a lockdep annotation to check that invariant.

    Signed-off-by: Pavel Begunkov
    Signed-off-by: Jens Axboe

    Pavel Begunkov
     
  • If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
    set, it means the cache device is created with obsoleted layout with
    obso_bucket_site_hi. Now bcache does not support this feature bit, a new
    BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
    for a better layout to support large bucket size.

    For the legacy compatibility purpose, if a cache device created with
    obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
    devices attached to this cache set should be set to read-only. Then the
    dirty data can be written back to backing device before re-create the
    cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
    by the latest bcache-tools.

    This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
    when running a cache set and attach a bcache device to the cache set. If
    this bit is set,
    - When run a cache set, print an error kernel message to indicate all
    following attached bcache device will be read-only.
    - When attach a bcache device, print an error kernel message to indicate
    the attached bcache device will be read-only, and ask users to update
    to latest bcache-tools.

    Such change is only for cache device whose bucket size >= 32MB, this is
    for the zoned SSD and almost nobody uses such large bucket size at this
    moment. If you don't explicit set a large bucket size for a zoned SSD,
    such change is totally transparent to your bcache device.

    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Coly Li
     
  • When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
    was introduced into the incompat feature set. It used bucket_size_hi
    (which was added at the tail of struct cache_sb_disk) to extend current
    16bit bucket size to 32bit with existing bucket_size in struct
    cache_sb_disk.

    This is not a good idea, there are two obvious problems,
    - Bucket size is always value power of 2, if store log2(bucket size) in
    existing bucket_size of struct cache_sb_disk, it is unnecessary to add
    bucket_size_hi.
    - Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
    struct cache_sb_disk, bucket_size_hi was added after d[] which makes
    csum_set calculate an unexpected super block checksum.

    To fix the above problems, this patch introduces a new incompat feature
    bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
    means bucket_size in struct cache_sb_disk stores the order of power-of-2
    bucket size value. When user specifies a bucket size larger than 32768
    sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
    incompat feature set, and bucket_size stores log2(bucket size) more
    than store the real bucket size value.

    The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
    it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
    recognized by kernel driver for legacy compatible purpose. The previous
    bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
    and not used in bcache-tools anymore.

    For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
    bcache-tools and kernel driver still recognize the feature string and
    display it as "obso_large_bucket".

    With this change, the unnecessary extra space extend of bcache on-disk
    super block can be avoided, and csum_set() may generate expected check
    sum as well.

    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Cc: stable@vger.kernel.org # 5.9+
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch adds the check for features which is incompatible for
    current supported feature sets.

    Now if the bcache device created by bcache-tools has features that
    current kernel doesn't support, read_super() will fail with error
    messoage. E.g. if an unsupported incompatible feature detected,
    bcache register will fail with dmesg "bcache: register_bcache() error :
    Unsupported incompatible feature found".

    Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Cc: stable@vger.kernel.org # 5.9+
    Signed-off-by: Jens Axboe

    Coly Li
     
  • This patch fixes the following typos,
    from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
    from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
    from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP

    Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
    Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
    Signed-off-by: Coly Li
    Cc: stable@vger.kernel.org # 5.9+
    Signed-off-by: Jens Axboe

    Coly Li
     
  • There is no need to reassign pdev_set_uuid in the second loop iteration,
    so move it to the place before second loop.

    Signed-off-by: Yi Li
    Signed-off-by: Coly Li
    Signed-off-by: Jens Axboe

    Yi Li