27 Sep, 2020

1 commit

  • Currently to make sure that every page table entry is read just once
    gup_fast walks perform READ_ONCE and pass pXd value down to the next
    gup_pXd_range function by value e.g.:

    static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
    unsigned int flags, struct page **pages, int *nr)
    ...
    pudp = pud_offset(&p4d, addr);

    This function passes a reference on that local value copy to pXd_offset,
    and might get the very same pointer in return. This happens when the
    level is folded (on most arches), and that pointer should not be
    iterated.

    On s390 due to the fact that each task might have different 5,4 or
    3-level address translation and hence different levels folded the logic
    is more complex and non-iteratable pointer to a local copy leads to
    severe problems.

    Here is an example of what happens with gup_fast on s390, for a task
    with 3-level paging, crossing a 2 GB pud boundary:

    // addr = 0x1007ffff000, end = 0x10080001000
    static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
    unsigned int flags, struct page **pages, int *nr)
    {
    unsigned long next;
    pud_t *pudp;

    // pud_offset returns &p4d itself (a pointer to a value on stack)
    pudp = pud_offset(&p4d, addr);
    do {
    // on second iteratation reading "random" stack value
    pud_t pud = READ_ONCE(*pudp);

    // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390
    next = pud_addr_end(addr, end);
    ...
    } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack

    return 1;
    }

    This happens since s390 moved to common gup code with commit
    d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") and
    commit 1a42010cdc26 ("s390/mm: convert to the generic
    get_user_pages_fast code").

    s390 tried to mimic static level folding by changing pXd_offset
    primitives to always calculate top level page table offset in pgd_offset
    and just return the value passed when pXd_offset has to act as folded.

    What is crucial for gup_fast and what has been overlooked is that
    PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly.
    And the latter is not possible with dynamic folding.

    To fix the issue in addition to pXd values pass original pXdp pointers
    down to gup_pXd_range functions. And introduce pXd_offset_lockless
    helpers, which take an additional pXd entry value parameter. This has
    already been discussed in

    https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1

    Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code")
    Signed-off-by: Vasily Gorbik
    Signed-off-by: Andrew Morton
    Reviewed-by: Gerald Schaefer
    Reviewed-by: Alexander Gordeev
    Reviewed-by: Jason Gunthorpe
    Reviewed-by: Mike Rapoport
    Reviewed-by: John Hubbard
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Dave Hansen
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Michael Ellerman
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Dave Hansen
    Cc: Andy Lutomirski
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Arnd Bergmann
    Cc: Andrey Ryabinin
    Cc: Heiko Carstens
    Cc: Christian Borntraeger
    Cc: Claudio Imbrenda
    Cc: [5.2+]
    Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours
    Signed-off-by: Linus Torvalds

    Vasily Gorbik
     

25 Sep, 2020

1 commit


23 Sep, 2020

3 commits

  • Pull vfs fixes from Al Viro:
    "No common topic, just assorted fixes"

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fuse: fix the ->direct_IO() treatment of iov_iter
    fs: fix cast in fsparam_u32hex() macro
    vboxsf: Fix the check for the old binary mount-arguments struct

    Linus Torvalds
     
  • Pull networking fixes from Jakub Kicinski:

    - fix failure to add bond interfaces to a bridge, the offload-handling
    code was too defensive there and recent refactoring unearthed that.
    Users complained (Ido)

    - fix unnecessarily reflecting ECN bits within TOS values / QoS marking
    in TCP ACK and reset packets (Wei)

    - fix a deadlock with bpf iterator. Hopefully we're in the clear on
    this front now... (Yonghong)

    - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)

    - fix AQL on mt76 devices with FW rate control and add a couple of AQL
    issues in mac80211 code (Felix)

    - fix authentication issue with mwifiex (Maximilian)

    - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)

    - fix exception handling for multipath routes via same device (David
    Ahern)

    - revert back to a BH spin lock flavor for nsid_lock: there are paths
    which do require the BH context protection (Taehee)

    - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)

    - fix ife module load deadlock (Cong)

    - make an adjustment to netlink reply message type for code added in
    this release (the sole change touching uAPI here) (Michal)

    - a number of fixes for small NXP and Microchip switches (Vladimir)

    [ Pull request acked by David: "you can expect more of this in the
    future as I try to delegate more things to Jakub" ]

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
    net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
    net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
    net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
    inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
    net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
    net: Update MAINTAINERS for MediaTek switch driver
    net/mlx5e: mlx5e_fec_in_caps() returns a boolean
    net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
    net/mlx5e: kTLS, Fix leak on resync error flow
    net/mlx5e: kTLS, Add missing dma_unmap in RX resync
    net/mlx5e: kTLS, Fix napi sync and possible use-after-free
    net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
    net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
    net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
    net/mlx5e: Fix endianness when calculating pedit mask first bit
    net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
    net/mlx5e: CT: Fix freeing ct_label mapping
    net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
    net/mlx5e: Use synchronize_rcu to sync with NAPI
    net/mlx5e: Use RCU to protect rq->xdp_prog
    ...

    Linus Torvalds
     
  • Pull tracing fixes from Steven Rostedt:

    - Check kprobe is enabled before unregistering from ftrace as it isn't
    registered when disabled.

    - Remove kprobes enabled via command-line that is on init text when
    freed.

    - Add missing RCU synchronization for ftrace trampoline symbols removed
    from kallsyms.

    - Free trampoline on error path if ftrace_startup() fails.

    - Give more space for the longer PID numbers in trace output.

    - Fix a possible double free in the histogram code.

    - A couple of fixes that were discovered by sparse.

    * tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    bootconfig: init: make xbc_namebuf static
    kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
    tracing: fix double free
    ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
    tracing: Make the space reserved for the pid wider
    ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
    ftrace: Free the trampoline when ftrace_startup() fails
    kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()

    Linus Torvalds
     

21 Sep, 2020

5 commits

  • dax_supported() is defined whenever CONFIG_DAX is enabled. So dummy
    implementation should be defined only in !CONFIG_DAX case, not in
    !CONFIG_FS_DAX case.

    Fixes: e2ec51282545 ("dm: Call proper helper to determine dax support")
    Cc:
    Reported-by: Geert Uytterhoeven
    Reported-by: Naresh Kamboju
    Reported-by: kernel test robot
    Signed-off-by: Jan Kara
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Pull locking fixes from Borislav Petkov:
    "Two fixes from the locking/urgent pile:

    - Fix lockdep's detection of "USED" inversions

    Linus Torvalds
     
  • Pull libnvdimm fixes from Dan Williams:
    "A handful of fixes to address a string of mistakes in the mechanism
    for device-mapper to determine if its component devices are dax
    capable.

    - Fix an original bug in device-mapper table reference counting when
    interrogating dax capability in the component device. This bug was
    hidden by the following bug.

    - Fix device-mapper to use the proper helper (dax_supported() instead
    of the leaf helper generic_fsdax_supported()) to determine dax
    operation of a stacked block device configuration. The original
    implementation is only valid for one level of dax-capable block
    device stacking. This bug was discovered while fixing the below
    regression.

    - Fix an infinite recursion regression introduced by broken attempts
    to quiet the generic_fsdax_supported() path and make it bail out
    before logging "dax capability not found" errors"

    * tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    dax: Fix stack overflow when mounting fsdax pmem device
    dm: Call proper helper to determine dax support
    dm/dax: Fix table reference counts

    Linus Torvalds
     
  • When calculating ancestor_size with IPv6 enabled, simply using
    sizeof(struct ipv6_pinfo) doesn't account for extra bytes needed for
    alignment in the struct sctp6_sock. On x86, there aren't any extra
    bytes, but on ARM the ipv6_pinfo structure is aligned on an 8-byte
    boundary so there were 4 pad bytes that were omitted from the
    ancestor_size calculation. This would lead to corruption of the
    pd_lobby pointers, causing an oops when trying to free the sctp
    structure on socket close.

    Fixes: 636d25d557d1 ("sctp: not copy sctp_sock pd_lobby in sctp_copy_descendant")
    Signed-off-by: Henry Ptasinski
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Henry Ptasinski
     
  • Pull tty/serial/fbcon fixes from Greg KH:
    "Here are some small tty/serial and one more fbcon fix.

    They include:

    - serial core locking regression fixes

    - new device ids for 8250_pci driver

    - fbcon fix for syzbot found issue

    All have been in linux-next with no reported issues"

    * tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    fbcon: Fix user font detection test at fbcon_resize().
    serial: 8250_pci: Add Realtek 816a and 816b
    serial: core: fix console port-lock regression
    serial: core: fix port-lock initialisation

    Linus Torvalds
     

20 Sep, 2020

3 commits

  • DM was calling generic_fsdax_supported() to determine whether a device
    referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
    they don't have to duplicate common generic checks. High level code
    should call dax_supported() helper which that calls into appropriate
    helper for the particular device. This problem manifested itself as
    kernel messages:

    dm-3: error: dax access failed (-95)

    when lvm2-testsuite run in cases where a DM device was stacked on top of
    another DM device.

    Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
    Cc:
    Tested-by: Adrian Huang
    Signed-off-by: Jan Kara
    Acked-by: Mike Snitzer
    Reported-by: kernel test robot
    Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams

    Jan Kara
     
  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    signature of stack_erasing_sysctl to match ctl_table.proc_handler which
    fixes the following sparse warning:

    kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
    kernel/stackleak.c:31:50: expected void *
    kernel/stackleak.c:31:50: got void [noderef] __user *buffer

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Signed-off-by: Tobias Klauser
    Signed-off-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Al Viro
    Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
    fixes the following sparse warning:

    kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
    kernel/trace/ftrace.c:7544:43: expected void *
    kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Signed-off-by: Tobias Klauser
    Signed-off-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Al Viro
    Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     

19 Sep, 2020

7 commits

  • Currently mscc_ocelot_init_ports() will skip initializing a port when it
    doesn't have a phy-handle, so the ocelot->ports[port] pointer will be
    NULL. Take this into consideration when tearing down the driver, and add
    a new function ocelot_deinit_port() to the switch library, mirror of
    ocelot_init_port(), which needs to be called by the driver for all ports
    it has initialized.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Tested-by: Alexandre Belloni
    Reviewed-by: Alexandre Belloni
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • The ocelot_port->ts_id is used to:
    (a) populate skb->cb[0] for matching the TX timestamp in the PTP IRQ
    with an skb.
    (b) populate the REW_OP from the injection header of the ongoing skb.
    Only then is ocelot_port->ts_id incremented.

    This is a problem because, at least theoretically, another timestampable
    skb might use the same ocelot_port->ts_id before that is incremented.
    Normally all transmit calls are serialized by the netdev transmit
    spinlock, but in this case, ocelot_port_add_txtstamp_skb() is also
    called by DSA, which has started declaring the NETIF_F_LLTX feature
    since commit 2b86cb829976 ("net: dsa: declare lockless TX feature for
    slave ports"). So the logic of using and incrementing the timestamp id
    should be atomic per port.

    The solution is to use the global ocelot_port->ts_id only while
    protected by the associated ocelot_port->ts_id_lock. That's where we
    populate skb->cb[0]. Note that for ocelot, ocelot_port_add_txtstamp_skb
    is called for the actual skb, but for felix, it is called for the skb's
    clone. That is something which will also be changed in the future.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Horatiu Vultur
    Reviewed-by: Florian Fainelli
    Tested-by: Alexandre Belloni
    Reviewed-by: Alexandre Belloni
    Signed-off-by: David S. Miller

    Vladimir Oltean
     
  • Pull arm64 fixes from Catalin Marinas:

    - Allow CPUs affected by erratum 1418040 to come online late
    (previously we only fixed the other case - CPUs not affected by the
    erratum coming up late).

    - Fix branch offset in BPF JIT.

    - Defer the stolen time initialisation to the CPU online time from the
    CPU starting time to avoid a (sleep-able) memory allocation in an
    atomic context.

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: paravirt: Initialize steal time when cpu is online
    arm64: bpf: Fix branch offset in JIT
    arm64: Allow CPUs unffected by ARM erratum 1418040 to come in late

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:
    "These add a new CPU ID to the RAPL power capping driver and prevent
    the ACPI processor idle driver from triggering RCU-lockdep complaints.

    Specifics:

    - Add support for the Lakefield chip to the RAPL power capping driver
    (Ricardo Neri).

    - Modify the ACPI processor idle driver to prevent it from triggering
    RCU-lockdep complaints which has started to happen after recent
    changes in that area (Peter Zijlstra)"

    * tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI: processor: Take over RCU-idle for C3-BM idle
    cpuidle: Allow cpuidle drivers to take over RCU-idle
    ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHED
    ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP
    powercap: RAPL: Add support for Lakefield

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "Here is a collection of fixes for 5.9. All look small and are nothing
    scary.

    The majority of changes are about ASoC driver- specific fixes, while
    there are a couple of ASoC core fixes (DAI lookup and lockdep stuff)
    and usual HD-audio quirks"

    * tag 'sound-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
    ALSA: hda/realtek - The Mic on a RedmiBook doesn't work
    ASoC: tlv320adcx140: Wake up codec before accessing register
    ASoC: core: Do not cleanup uninitialized dais on soc_pcm_open failure
    ALSA: hda: fixup headset for ASUS GX502 laptop
    ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1
    ASoC: Intel: haswell: Fix power transition refactor
    ASoC: tlv320adcx140: Fix accessing uninitialized adcx140->dev
    ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions
    ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811
    ASoC: meson: axg-toddr: fix channel order on g12 platforms
    ASoC: soc-core: add snd_soc_find_dai_with_mutex()
    ASoC: qcom: common: Fix refcount imbalance on error
    ASoC: rt700: Fix return check for devm_regmap_init_sdw()
    ASoC: rt715: Fix return check for devm_regmap_init_sdw()
    ASoC: rt711: Fix return check for devm_regmap_init_sdw()
    ASoC: rt1308-sdw: Fix return check for devm_regmap_init_sdw()
    ASoC: max98373: Fix return check for devm_regmap_init_sdw()
    ASoC: ti: fixup ams_delta_mute() function name
    ASoC: pcm3168a: ignore 0 Hz settings
    ASoC: Intel: tgl_max98373: fix a runtime pm issue in multi-thread case
    ...

    Linus Torvalds
     
  • Since kprobe_event= cmdline option allows user to put kprobes on the
    functions in initmem, kprobe has to make such probes gone after boot.
    Currently the probes on the init functions in modules will be handled
    by module callback, but the kernel init text isn't handled.
    Without this, kprobes may access non-exist text area to disable or
    remove it.

    Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2

    Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
    Cc: Jonathan Corbet
    Cc: Shuah Khan
    Cc: Randy Dunlap
    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
    fixes the following sparse warning:

    kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
    kernel/trace/ftrace.c:7544:43: expected void *
    kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer

    Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Tobias Klauser
    Signed-off-by: Steven Rostedt (VMware)

    Tobias Klauser
     

18 Sep, 2020

3 commits

  • Tunnel offload info code uses ETHTOOL_MSG_TUNNEL_INFO_GET message type (cmd
    field in genetlink header) for replies to tunnel info netlink request, i.e.
    the same value as the request have. This is a problem because we are using
    two separate enums for userspace to kernel and kernel to userspace message
    types so that this ETHTOOL_MSG_TUNNEL_INFO_GET (28) collides with
    ETHTOOL_MSG_CABLE_TEST_TDR_NTF which is what message type 28 means for
    kernel to userspace messages.

    As the tunnel info request reached mainline in 5.9 merge window, we should
    still be able to fix the reply message type without breaking backward
    compatibility.

    Fixes: c7d759eb7b12 ("ethtool: add tunnel info interface")
    Signed-off-by: Michal Kubecek
    Reviewed-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Michal Kubecek
     
  • Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
    the page locking entirely fair, in that if a waiter came in while the
    lock was held, the lock would be transferred to the lockers strictly in
    order.

    That was intended to finally get rid of the long-reported watchdog
    failures that involved the page lock under extreme load, where a process
    could end up waiting essentially forever, as other page lockers stole
    the lock from under it.

    It also improved some benchmarks, but it ended up causing huge
    performance regressions on others, simply because fair lock behavior
    doesn't end up giving out the lock as aggressively, causing better
    worst-case latency, but potentially much worse average latencies and
    throughput.

    Instead of reverting that change entirely, this introduces a controlled
    amount of unfairness, with a sysctl knob to tune it if somebody needs
    to. But the default value should hopefully be good for any normal load,
    allowing a few rounds of lock stealing, but enforcing the strict
    ordering before the lock has been stolen too many times.

    There is also a hint from Matthieu Baerts that the fair page coloring
    may end up exposing an ABBA deadlock that is hidden by the usual
    optimistic lock stealing, and while the unfairness doesn't fix the
    fundamental issue (and I'm still looking at that), it avoids it in
    practice.

    The amount of unfairness can be modified by writing a new value to the
    'sysctl_page_lock_unfairness' variable (default value of 5, exposed
    through /proc/sys/vm/page_lock_unfairness), but that is hopefully
    something we'd use mainly for debugging rather than being necessary for
    any deep system tuning.

    This whole issue has exposed just how critical the page lock can be, and
    how contended it gets under certain locks. And the main contention
    doesn't really seem to be anything related to IO (which was the origin
    of this lock), but for things like just verifying that the page file
    mapping is stable while faulting in the page into a page table.

    Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
    Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
    Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
    Reported-and-tested-by: Michael Larabel
    Tested-by: Matthieu Baerts
    Cc: Dave Chinner
    Cc: Matthew Wilcox
    Cc: Chris Mason
    Cc: Jan Kara
    Cc: Amir Goldstein
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Steal time initialization requires mapping a memory region which
    invokes a memory allocation. Doing this at CPU starting time results
    in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled:

    BUG: sleeping function called from invalid context at mm/slab.h:498
    in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1
    Call trace:
    dump_backtrace+0x0/0x208
    show_stack+0x1c/0x28
    dump_stack+0xc4/0x11c
    ___might_sleep+0xf8/0x130
    __might_sleep+0x58/0x90
    slab_pre_alloc_hook.constprop.101+0xd0/0x118
    kmem_cache_alloc_node_trace+0x84/0x270
    __get_vm_area_node+0x88/0x210
    get_vm_area_caller+0x38/0x40
    __ioremap_caller+0x70/0xf8
    ioremap_cache+0x78/0xb0
    memremap+0x9c/0x1a8
    init_stolen_time_cpu+0x54/0xf0
    cpuhp_invoke_callback+0xa8/0x720
    notify_cpu_starting+0xc8/0xd8
    secondary_start_kernel+0x114/0x180
    CPU1: Booted secondary processor 0x0000000001 [0x431f0a11]

    However we don't need to initialize steal time at CPU starting time.
    We can simply wait until CPU online time, just sacrificing a bit of
    accuracy by returning zero for steal time until we know better.

    While at it, add __init to the functions that are only called by
    pv_time_init() which is __init.

    Signed-off-by: Andrew Jones
    Fixes: e0685fa228fd ("arm64: Retrieve stolen time as paravirtualized guest")
    Cc: stable@vger.kernel.org
    Reviewed-by: Steven Price
    Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.com
    Signed-off-by: Catalin Marinas

    Andrew Jones
     

17 Sep, 2020

2 commits


16 Sep, 2020

2 commits

  • The __this_cpu*() accessors are (in general) IRQ-unsafe which, given
    that percpu-rwsem is a blocking primitive, should be just fine.

    However, file_end_write() is used from IRQ context and will cause
    load-store issues on architectures where the per-cpu accessors are not
    natively irq-safe.

    Fix it by using the IRQ-safe this_cpu_*() for operations on
    read_count. This will generate more expensive code on a number of
    platforms, which might cause a performance regression for some of the
    other percpu-rwsem users.

    If any such is reported, we can consider alternative solutions.

    Fixes: 70fe2f48152e ("aio: fix freeze protection of aio writes")
    Signed-off-by: Hou Tao
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Acked-by: Oleg Nesterov
    Link: https://lkml.kernel.org/r/20200915140750.137881-1-houtao1@huawei.com

    Hou Tao
     
  • Fix the port-lock initialisation regression introduced by commit
    a3cb39d258ef ("serial: core: Allow detach and attach serial device for
    console") by making sure that the lock is again initialised during
    console setup.

    The console may be registered before the serial controller has been
    probed in which case the port lock needs to be initialised during
    console setup by a call to uart_set_options(). The console-detach
    changes introduced a regression in several drivers by effectively
    removing that initialisation by not initialising the lock when the port
    is used as a console (which is always the case during console setup).

    Add back the early lock initialisation and instead use a new
    console-reinit flag to handle the case where a console is being
    re-attached through sysfs.

    The question whether the console-detach interface should have been added
    in the first place is left for another discussion.

    Note that the console-enabled check in uart_set_options() is not
    redundant because of kgdboc, which can end up reinitialising an already
    enabled console (see commit 42b6a1baa3ec ("serial_core: Don't
    re-initialize a previously initialized spinlock.")).

    Fixes: a3cb39d258ef ("serial: core: Allow detach and attach serial device for console")
    Cc: stable # 5.7
    Signed-off-by: Johan Hovold
    Reviewed-by: Andy Shevchenko
    Link: https://lore.kernel.org/r/20200909143101.15389-3-johan@kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Johan Hovold
     

15 Sep, 2020

2 commits

  • As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
    on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
    be set to or parse from the packet for vxlan gbp option.

    So we'd better do the mask when set it in act_tunnel_key and cls_flower.
    Otherwise, when users don't know these bits, they may configure with a
    value which can never be matched.

    Reported-by: Shuang Li
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     
  • flowi4_multipath_hash was added by the commit referenced below for
    tunnels. Unfortunately, the patch did not initialize the new field
    for several fast path lookups that do not initialize the entire flow
    struct to 0. Fix those locations. Currently, flowi4_multipath_hash
    is random garbage and affects the hash value computed by
    fib_multipath_hash for multipath selection.

    Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash")
    Signed-off-by: David Ahern
    Cc: wenxu
    Signed-off-by: David S. Miller

    David Ahern
     

14 Sep, 2020

2 commits

  • The patch partially reverts some of the UAPI bits of the buffer
    cache management hints. Namely, the queue consistency (memory
    coherency) user-space hint because, as it turned out, the kernel
    implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.

    The patch reverts both kernel and user space parts: removes the
    DMA consistency attr functions, rolls back changes to v4l2_requestbuffers,
    v4l2_create_buffers structures and corresponding UAPI functions
    (plus compat32 layer) and cleans up the documentation.

    [hverkuil: fixed a few typos in the commit log]
    [hverkuil: fixed vb2_core_reqbufs call in drivers/media/dvb-core/dvb_vb2.c]
    [mchehab: fixed a typo in the commit log: revers->reverts]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Hans Verkuil
    Signed-off-by: Mauro Carvalho Chehab

    Sergey Senozhatsky
     
  • Pull driver core fixes from Greg KH:
    "Here are some small driver core and debugfs fixes for 5.9-rc5

    Included in here are:

    - firmware loader memory leak fix

    - firmware loader testing fixes for non-EFI systems

    - device link locking fixes found by lockdep

    - kobject_del() bugfix that has been affecting some callers

    - debugfs minor fix

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    test_firmware: Test platform fw loading on non-EFI systems
    PM: : fix @em_pd kernel-doc warning
    kobject: Drop unneeded conditional in __kobject_del()
    driver core: Fix device_pm_lock() locking for device links
    MAINTAINERS: Add the security document to SECURITY CONTACT
    driver code: print symbolic error code
    debugfs: Fix module state check condition
    kobject: Restore old behaviour of kobject_del(NULL)
    firmware_loader: fix memory leak for paged buffer

    Linus Torvalds
     

13 Sep, 2020

2 commits

  • Pull char / misc driver fixes from Greg KH:
    "Here are a number of small driver fixes for 5.9-rc5

    Included in here are:

    - habanalabs driver fixes

    - interconnect driver fixes

    - soundwire driver fixes

    - dyndbg fixes for reported issues, and then reverts to fix it all up
    to a sane state.

    - phy driver fixes

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'char-misc-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    Revert "dyndbg: accept query terms like file=bar and module=foo"
    Revert "dyndbg: fix problem parsing format="foo bar""
    scripts/tags.sh: exclude tools directory from tags generation
    video: fbdev: fix OOB read in vga_8planes_imageblit()
    dyndbg: fix problem parsing format="foo bar"
    dyndbg: refine export, rename to dynamic_debug_exec_queries()
    dyndbg: give %3u width in pr-format, cosmetic only
    interconnect: qcom: Fix small BW votes being truncated to zero
    soundwire: fix double free of dangling pointer
    interconnect: Show bandwidth for disabled paths as zero in debugfs
    habanalabs: fix report of RAZWI initiator coordinates
    habanalabs: prevent user buff overflow
    phy: omap-usb2-phy: disable PHY charger detect
    phy: qcom-qmp: Use correct values for ipq8074 PCIe Gen2 PHY init
    soundwire: bus: fix typo in comment on INTSTAT registers
    phy: qualcomm: fix return value check in qcom_ipq806x_usb_phy_probe()
    phy: qualcomm: fix platform_no_drv_owner.cocci warnings

    Linus Torvalds
     
  • Pull kvm fixes from Paolo Bonzini:
    "A bit on the bigger side, mostly due to me being on vacation, then
    busy, then on parental leave, but there's nothing worrisome.

    ARM:
    - Multiple stolen time fixes, with a new capability to match x86
    - Fix for hugetlbfs mappings when PUD and PMD are the same level
    - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty
    logging, for example)
    - Fix tracing output of 64bit values

    x86:
    - nSVM state restore fixes
    - Async page fault fixes
    - Lots of small fixes everywhere"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
    KVM: emulator: more strict rsm checks.
    KVM: nSVM: more strict SMM checks when returning to nested guest
    SVM: nSVM: setup nested msr permission bitmap on nested state load
    SVM: nSVM: correctly restore GIF on vmexit from nesting after migration
    x86/kvm: don't forget to ACK async PF IRQ
    x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro
    KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
    KVM: SVM: avoid emulation with stale next_rip
    KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
    KVM: SVM: Periodically schedule when unregistering regions on destroy
    KVM: MIPS: Change the definition of kvm type
    kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed
    KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control
    KVM: fix memory leak in kvm_io_bus_unregister_dev()
    KVM: Check the allocation of pv cpu mask
    KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
    KVM: arm64: Update page shift if stage 2 block mapping not supported
    KVM: arm64: Fix address truncation in traces
    KVM: arm64: Do not try to map PUDs when they are folded into PMD
    arm64/x86: KVM: Introduce steal-time cap
    ...

    Linus Torvalds
     

12 Sep, 2020

2 commits

  • Pull i2c updates from Wolfram Sang:
    "Usual driver bugfixes for the I2C subsystem"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: algo: pca: Reapply i2c bus settings after reset
    i2c: npcm7xx: Fix timeout calculation
    misc: eeprom: at24: register nvmem only after eeprom is ready to use

    Linus Torvalds
     
  • MIPS defines two kvm types:

    #define KVM_VM_MIPS_TE 0
    #define KVM_VM_MIPS_VZ 1

    In Documentation/virt/kvm/api.rst it is said that "You probably want to
    use 0 as machine type", which implies that type 0 be the "automatic" or
    "default" type. And, in user-space libvirt use the null-machine (with
    type 0) to detect the kvm capability, which returns "KVM not supported"
    on a VZ platform.

    I try to fix it in QEMU but it is ugly:
    https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html

    And Thomas Huth suggests me to change the definition of kvm type:
    https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html

    So I define like this:

    #define KVM_VM_MIPS_AUTO 0
    #define KVM_VM_MIPS_VZ 1
    #define KVM_VM_MIPS_TE 2

    Since VZ and TE cannot co-exists, using type 0 on a TE platform will
    still return success (so old user-space tools have no problems on new
    kernels); the advantage is that using type 0 on a VZ platform will not
    return failure. So, the only problem is "new user-space tools use type
    2 on old kernels", but if we treat this as a kernel bug, we can backport
    this patch to old stable kernels.

    Signed-off-by: Huacai Chen
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Huacai Chen
     

11 Sep, 2020

5 commits