11 Feb, 2019

1 commit

  • Pull irq fixes from Ingo Molnar:
    "irqchip driver fixes: most of them are race fixes for ARM GIC (General
    Interrupt Controller) variants, but also a fix for the ARM MMP
    (Marvell PXA168 et al) irqchip affecting OLPC keyboards"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/gic-v3-its: Fix ITT_entry_size accessor
    irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
    irqchip/gic-v3-its: Gracefully fail on LPI exhaustion
    irqchip/gic-v3-its: Plug allocation race for devices sharing a DevID
    irqchip/gic-v4: Fix occasional VLPI drop

    Linus Torvalds
     

10 Feb, 2019

1 commit

  • Pull block fixes from Jens Axboe:

    - NVMe pull request from Christoph, fixing namespace locking when
    dealing with the effects log, and a rapid add/remove issue (Keith)

    - blktrace tweak, ensuring requests with -1 sectors are shown (Jan)

    - link power management quirk for a Smasung SSD (Hans)

    - m68k nfblock dynamic major number fix (Chengguang)

    - series fixing blk-iolatency inflight counter issue (Liu)

    - ensure that we clear ->private when setting up the aio kiocb (Mike)

    - __find_get_block_slow() rate limit print (Tetsuo)

    * tag 'for-linus-20190209' of git://git.kernel.dk/linux-block:
    blk-mq: remove duplicated definition of blk_mq_freeze_queue
    Blk-iolatency: warn on negative inflight IO counter
    blk-iolatency: fix IO hang due to negative inflight counter
    blktrace: Show requests without sector
    fs: ratelimit __find_get_block_slow() failure message.
    m68k: set proper major_num when specifying module param major_num
    libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
    nvme-pci: fix rapid add remove sequence
    nvme: lock NS list changes while handling command effects
    aio: initialize kiocb private in case any filesystems expect it.

    Linus Torvalds
     

09 Feb, 2019

3 commits

  • Pull ARM SoC fixes from Arnd Bergmann:
    "This is a bit larger than normal, as we had not managed to send out a
    pull request before traveling for a week without my signing key.

    There are multiple code fixes for older bugs, all of which should get
    backported into stable kernels:

    - tango: one fix for multiplatform configurations broken on other
    platforms when tango is enabled

    - arm_scmi: device unregistration fix

    - iop32x: fix kernel oops from extraneous __init annotation

    - pxa: remove a double kfree

    - fsl qbman: close an interrupt clearing race

    The rest is the usual collection of smaller fixes for device tree
    files, on the renesas, allwinner, meson, omap, davinci, qualcomm and
    imx platforms.

    Some of these are for compile-time warnings, most are for board
    specific functionality that fails to work because of incorrect
    settings"

    * tag 'armsoc-fixes-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits)
    ARM: tango: Improve ARCH_MULTIPLATFORM compatibility
    firmware: arm_scmi: provide the mandatory device release callback
    ARM: iop32x/n2100: fix PCI IRQ mapping
    arm64: dts: add msm8996 compatible to gicv3
    ARM: dts: am335x-shc.dts: fix wrong cd pin level
    ARM: dts: n900: fix mmc1 card detect gpio polarity
    ARM: dts: omap3-gta04: Fix graph_port warning
    ARM: pxa: ssp: unneeded to free devm_ allocated data
    ARM: dts: r8a7743: Convert to new LVDS DT bindings
    soc: fsl: qbman: avoid race in clearing QMan interrupt
    arm64: dts: renesas: r8a77965: Enable DMA for SCIF2
    arm64: dts: renesas: r8a7796: Enable DMA for SCIF2
    arm64: dts: renesas: r8a774a1: Enable DMA for SCIF2
    ARM: dts: da850: fix interrupt numbers for clocksource
    dt-bindings: imx8mq: Number clocks consecutively
    arm64: dts: meson: Fix mmc cd-gpios polarity
    ARM: dts: imx6sx: correct backward compatible of gpt
    ARM: dts: imx: replace gpio-key,wakeup with wakeup-source property
    ARM: dts: vf610-bk4: fix incorrect #address-cells for dspi3
    ARM: dts: meson8m2: mxiii-plus: mark the SD card detection GPIO active-low
    ...

    Linus Torvalds
     
  • Pull signal fixes from Eric Biederman:
    "This contains four small fixes for signal handling. A missing range
    check, a regression fix, prioritizing signals we have already started
    a signal group exit for, and better detection of synchronous signals.

    The confused decision of which signals to handle failed spectacularly
    when a timer was pointed at SIGBUS and the stack overflowed. Resulting
    in an unkillable process in an infinite loop instead of a SIGSEGV and
    core dump"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    signal: Better detection of synchronous signals
    signal: Always notice exiting tasks
    signal: Always attempt to allocate siginfo for SIGSTOP
    signal: Make siginmask safe when passed a signal of 0

    Linus Torvalds
     
  • Pull networking fixes from David Miller:
    "This pull request is dedicated to the upcoming snowpocalypse parts 2
    and 3 in the Pacific Northwest:

    1) Drop profiles are broken because some drivers use dev_kfree_skb*
    instead of dev_consume_skb*, from Yang Wei.

    2) Fix IWLWIFI kconfig deps, from Luca Coelho.

    3) Fix percpu maps updating in bpftool, from Paolo Abeni.

    4) Missing station release in batman-adv, from Felix Fietkau.

    5) Fix some networking compat ioctl bugs, from Johannes Berg.

    6) ucc_geth must reset the BQL queue state when stopping the device,
    from Mathias Thore.

    7) Several XDP bug fixes in virtio_net from Toshiaki Makita.

    8) TSO packets must be sent always on queue 0 in stmmac, from Jose
    Abreu.

    9) Fix socket refcounting bug in RDS, from Eric Dumazet.

    10) Handle sparse cpu allocations in bpf selftests, from Martynas
    Pumputis.

    11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
    Feitkau.

    12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
    Greg Kroah-Hartman.

    13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
    ccid, from Eric Dumazet.

    14) Need to reload WoL password into bcmsysport device after deep
    sleeps, from Florian Fainelli.

    15) Remove filter from mask before freeing in cls_flower, from Petr
    Machata.

    16) Missing release and use after free in error paths of s390 qeth
    code, from Julian Wiedmann.

    17) Fix lockdep false positive in dsa code, from Marc Zyngier.

    18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.

    19) Fix EQ firmware assert in qed driver, from Manish Chopra.

    20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
    net: dsa: b53: Fix for failure when irq is not defined in dt
    sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
    geneve: should not call rt6_lookup() when ipv6 was disabled
    net: Don't default Cavium PTP driver to 'y'
    net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net/mlx5e: Don't overwrite pedit action when multiple pedit used
    net/mlx5e: Update hw flows when encap source mac changed
    qed*: Advance drivers version to 8.37.0.20
    qed: Change verbosity for coalescing message.
    qede: Fix system crash on configuring channels.
    qed: Consider TX tcs while deriving the max num_queues for PF.
    ...

    Linus Torvalds
     

08 Feb, 2019

2 commits

  • …rm-platforms into irq/urgent

    Pull irqchip updates from Marc Zyngier:

    - Another GICv3 ITS fix for devices sharing the same DevID
    - Don't return invalid data on exhaustion of the GICv3 LPI pool
    - Fix a GICv3 field decoding bug leading to memory over-allocation
    - Init GICv4 at boot time instead of lazy init
    - Fix interrupt masking on PJ4

    Thomas Gleixner
     
  • Currently, blktrace will not show requests that don't have any data as
    rq->__sector is initialized to -1 which is out of device range and thus
    discarded by act_log_check(). This is most notably the case for cache
    flush requests sent to the device. Fix the problem by making
    blk_rq_trace_sector() return 0 for requests without initialized sector.

    Reviewed-by: Johannes Thumshirn
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

07 Feb, 2019

3 commits

  • Pull HID fix from Jiri Kosina:
    "A fix for a bug in hid-debug that can lock up the kernel in infinite
    loop (CVE-2019-3819), from Vladis Dronov"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
    HID: debug: fix the ring buffer implementation

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "A collection of a few small fixes.

    The most significant one is the fix for the possible race at loading
    HD-audio drivers. This has been present for long time and surfaced
    only in a rare occasion, but finally spotted out"

    * tag 'sound-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
    ALSA: compress: Fix stop handling on compressed capture streams
    ALSA: usb-audio: Add support for new T+A USB DAC
    ALSA: hda - Serialize codec registrations
    ALSA: hda/realtek - Use a common helper for hp pin reference
    ALSA: hda/realtek - Fix lose hp_pins for disable auto mute
    ALSA: hda/realtek - Headset microphone support for System76 darp5

    Linus Torvalds
     
  • Pull virtio fixes from Michael Tsirkin:
    "A small fix for a uapi header, and a fix for VDPA for non-x86 guests"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio: drop internal struct from UAPI
    virtio: support VIRTIO_F_ORDER_PLATFORM

    Linus Torvalds
     

06 Feb, 2019

2 commits

  • It is normal user behaviour to start, stop, then start a stream
    again without closing it. Currently this works for compressed
    playback streams but not capture ones.

    The states on a compressed capture stream go directly from OPEN to
    PREPARED, unlike a playback stream which moves to SETUP and waits
    for a write of data before moving to PREPARED. Currently however,
    when a stop is sent the state is set to SETUP for both types of
    streams. This leaves a capture stream in the situation where a new
    start can't be sent as that requires the state to be PREPARED and
    a new set_params can't be sent as that requires the state to be
    OPEN. The only option being to close the stream, and then reopen.

    Correct this issues by allowing snd_compr_drain_notify to set the
    state depending on the stream direction, as we already do in
    set_params.

    Fixes: 49bb6402f1aa ("ALSA: compress_core: Add support for capture streams")
    Signed-off-by: Charles Keepax
    Cc:
    Signed-off-by: Takashi Iwai

    Charles Keepax
     
  • There's no reason to expose struct vring_packed in UAPI - if we do we
    won't be able to change or drop it, and it's not part of any interface.

    Let's move it to virtio_ring.c

    Cc: Tiwei Bie
    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

05 Feb, 2019

1 commit

  • Anonymous sets that are bound to rules from the same transaction trigger
    a kernel splat from the abort path due to double set list removal and
    double free.

    This patch updates the logic to search for the transaction that is
    responsible for creating the set and disable the set list removal and
    release, given the rule is now responsible for this. Lookup is reverse
    since the transaction that adds the set is likely to be at the tail of
    the list.

    Moreover, this patch adds the unbind step to deliver the event from the
    commit path. This should not be done from the worker thread, since we
    have no guarantees of in-order delivery to the listener.

    This patch removes the assumption that both activate and deactivate
    callbacks need to be provided.

    Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
    Reported-by: Mikhail Morfikov
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

04 Feb, 2019

2 commits

  • Pull x86 fixes from Thomas Gleixner:
    "A few updates for x86:

    - Fix an unintended sign extension issue in the fault handling code

    - Rename the new resource control config switch so it's less
    confusing

    - Avoid setting up EFI info in kexec when the EFI runtime is
    disabled.

    - Fix the microcode version check in the AMD microcode loader so it
    only loads higher version numbers and never downgrades

    - Set EFER.LME in the 32bit trampoline before returning to long mode
    to handle older AMD/KVM behaviour properly.

    - Add Darren and Andy as x86/platform reviewers"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/resctrl: Avoid confusion over the new X86_RESCTRL config
    x86/kexec: Don't setup EFI info if EFI runtime is not enabled
    x86/microcode/amd: Don't falsely trick the late loading mechanism
    MAINTAINERS: Add Andy and Darren as arch/x86/platform/ reviewers
    x86/fault: Fix sign-extend unintended sign extension
    x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode
    x86/cpu: Add Atom Tremont (Jacobsville)

    Linus Torvalds
     
  • Pull cpu hotplug fixes from Thomas Gleixner:
    "Two fixes for the cpu hotplug machinery:

    - Replace the overly clever 'SMT disabled by BIOS' detection logic as
    it breaks KVM scenarios and prevents speculation control updates
    when the Hyperthreads are brought online late after boot.

    - Remove a redundant invocation of the speculation control update
    function"

    * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
    x86/speculation: Remove redundant arch_smt_update() invocation

    Linus Torvalds
     

03 Feb, 2019

1 commit

  • Pull block fixes from Jens Axboe:
    "A few fixes that should go into this release. This contains:

    - MD pull request from Song, fixing a recovery OOM issue (Alexei)

    - Fix for a sync related stall (Jianchao)

    - Dummy callback for timeouts (Tetsuo)

    - IDE atapi sense ordering fix (me)"

    * tag 'for-linus-20190202' of git://git.kernel.dk/linux-block:
    ide: ensure atapi sense request aren't preempted
    blk-mq: fix a hung issue when fsync
    block: pass no-op callback to INIT_WORK().
    md/raid5: fix 'out of memory' during raid cache recovery

    Linus Torvalds
     

02 Feb, 2019

6 commits

  • "Resource Control" is a very broad term for this CPU feature, and a term
    that is also associated with containers, cgroups etc. This can easily
    cause confusion.

    Make the user prompt more specific. Match the config symbol name.

    [ bp: In the future, the corresponding ARM arch-specific code will be
    under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
    under the CPU_RESCTRL umbrella symbol. ]

    Signed-off-by: Johannes Weiner
    Signed-off-by: Borislav Petkov
    Cc: Babu Moger
    Cc: Fenghua Yu
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: James Morse
    Cc: Jonathan Corbet
    Cc: "Kirill A. Shutemov"
    Cc: linux-doc@vger.kernel.org
    Cc: Peter Zijlstra
    Cc: Pu Wen
    Cc: Reinette Chatre
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org

    Johannes Weiner
     
  • On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
    with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
    not directly mapped (MEMBLOCK_NOMAP). Hence, the page->flags is
    uninitialized.

    This is due to the commit 9f1eb38e0e11 ("mm, kmemleak: little
    optimization while scanning") starts to use pfn_to_online_page() instead
    of pfn_valid(). However, in the CONFIG_MEMORY_HOTPLUG=y case,
    pfn_to_online_page() does not call memblock_is_map_memory() while
    pfn_valid() does.

    Historically, the commit 68709f45385a ("arm64: only consider memblocks
    with NOMAP cleared for linear mapping") causes pages marked as nomap
    being no long reassigned to the new zone in memmap_init_zone() by
    calling __init_single_page().

    Since the commit 2d070eab2e82 ("mm: consider zone which is not fully
    populated to have holes") introduced pfn_to_online_page() and was
    designed to return a valid pfn only, but it is clearly broken on arm64.

    Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
    handle nomap thanks to the commit f52bb98f5ade ("arm64: mm: always
    enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
    architectures where have no HOLES_IN_ZONE.

    [1]
    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
    Mem abort info:
    ESR = 0x96000005
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    Data abort info:
    ISV = 0, ISS = 0x00000005
    CM = 0, WnR = 0
    Internal error: Oops: 96000005 [#1] SMP
    CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8
    pstate: 60400009 (nZCv daif +PAN -UAO)
    pc : page_mapping+0x24/0x144
    lr : __dump_page+0x34/0x3dc
    sp : ffff00003a5cfd10
    x29: ffff00003a5cfd10 x28: 000000000000802f
    x27: 0000000000000000 x26: 0000000000277d00
    x25: ffff000010791f56 x24: ffff7fe000000000
    x23: ffff000010772f8b x22: ffff00001125f670
    x21: ffff000011311000 x20: ffff000010772f8b
    x19: fffffffffffffffe x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000000
    x15: 0000000000000000 x14: ffff802698b19600
    x13: ffff802698b1a200 x12: ffff802698b16f00
    x11: ffff802698b1a400 x10: 0000000000001400
    x9 : 0000000000000001 x8 : ffff00001121a000
    x7 : 0000000000000000 x6 : ffff0000102c53b8
    x5 : 0000000000000000 x4 : 0000000000000003
    x3 : 0000000000000100 x2 : 0000000000000000
    x1 : ffff000010772f8b x0 : ffffffffffffffff
    Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
    Call trace:
    page_mapping+0x24/0x144
    __dump_page+0x34/0x3dc
    dump_page+0x28/0x4c
    kmemleak_scan+0x4ac/0x680
    kmemleak_scan_thread+0xb4/0xdc
    kthread+0x12c/0x13c
    ret_from_fork+0x10/0x18
    Code: d503201f f9400660 36000040 d1000413 (f9400661)
    ---[ end trace 4d4bd7f573490c8e ]---
    Kernel panic - not syncing: Fatal exception
    SMP: stopping secondary CPUs
    Kernel Offset: disabled
    CPU features: 0x002,20000c38
    Memory Limit: none
    ---[ end Kernel panic - not syncing: Fatal exception ]---

    Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw
    Fixes: 9f1eb38e0e11 ("mm, kmemleak: little optimization while scanning")
    Signed-off-by: Qian Cai
    Acked-by: Michal Hocko
    Cc: Oscar Salvador
    Cc: Catalin Marinas
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     
  • Arkadiusz reported that enabling memcg's group oom killing causes
    strange memcg statistics where there is no task in a memcg despite the
    number of tasks in that memcg is not 0. It turned out that there is a
    bug in wake_oom_reaper() which allows enqueuing same task twice which
    makes impossible to decrease the number of tasks in that memcg due to a
    refcount leak.

    This bug existed since the OOM reaper became invokable from
    task_will_free_mem(current) path in out_of_memory() in Linux 4.7,

    T1@P1 |T2@P1 |T3@P1 |OOM reaper
    ----------+----------+----------+------------
    # Processing an OOM victim in a different memcg domain.
    try_charge()
    mem_cgroup_out_of_memory()
    mutex_lock(&oom_lock)
    try_charge()
    mem_cgroup_out_of_memory()
    mutex_lock(&oom_lock)
    try_charge()
    mem_cgroup_out_of_memory()
    mutex_lock(&oom_lock)
    out_of_memory()
    oom_kill_process(P1)
    do_send_sig_info(SIGKILL, @P1)
    mark_oom_victim(T1@P1)
    wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
    mutex_unlock(&oom_lock)
    out_of_memory()
    mark_oom_victim(T2@P1)
    wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
    mutex_unlock(&oom_lock)
    out_of_memory()
    mark_oom_victim(T1@P1)
    wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
    mutex_unlock(&oom_lock)
    # Completed processing an OOM victim in a different memcg domain.
    spin_lock(&oom_reaper_lock)
    # T1P1 is dequeued.
    spin_unlock(&oom_reaper_lock)

    but memcg's group oom killing made it easier to trigger this bug by
    calling wake_oom_reaper() on the same task from one out_of_memory()
    request.

    Fix this bug using an approach used by commit 855b018325737f76 ("oom,
    oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a
    side effect of this patch, this patch also avoids enqueuing multiple
    threads sharing memory via task_will_free_mem(current) path.

    Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
    Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
    Fixes: af8e15cc85a25315 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
    Signed-off-by: Tetsuo Handa
    Reported-by: Arkadiusz Miskiewicz
    Tested-by: Arkadiusz Miskiewicz
    Acked-by: Michal Hocko
    Acked-by: Roman Gushchin
    Cc: Tejun Heo
    Cc: Aleksa Sarai
    Cc: Jay Kamat
    Cc: Johannes Weiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Alexei Starovoitov says:

    ====================
    pull-request: bpf 2019-01-31

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) disable preemption in sender side of socket filters, from Alexei.

    2) fix two potential deadlocks in syscall bpf lookup and prog_register,
    from Martin and Alexei.

    3) fix BTF to allow typedef on func_proto, from Yonghong.

    4) two bpftool fixes, from Jiri and Paolo.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull rdma fixes from Jason Gunthorpe:
    "Still not much going on, the usual set of oops and driver fixes this
    time:

    - Fix two uapi breakage regressions in mlx5 drivers

    - Various oops fixes in hfi1, mlx4, umem, uverbs, and ipoib

    - A protocol bug fix for hfi1 preventing it from implementing the
    verbs API properly, and a compatability fix for EXEC STACK user
    programs

    - Fix missed refcounting in the 'advise_mr' patches merged this
    cycle.

    - Fix wrong use of the uABI in the hns SRQ patches merged this cycle"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate
    IB/ipoib: Fix for use-after-free in ipoib_cm_tx_start
    IB/uverbs: Fix ioctl query port to consider device disassociation
    RDMA/mlx5: Fix flow creation on representors
    IB/uverbs: Fix OOPs upon device disassociation
    RDMA/umem: Add missing initialization of owning_mm
    RDMA/hns: Update the kernel header file of hns
    IB/mlx5: Fix how advise_mr() launches async work
    RDMA/device: Expose ib_device_try_get(()
    IB/hfi1: Add limit test for RC/UC send via loopback
    IB/hfi1: Remove overly conservative VM_EXEC flag check
    IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
    IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOV
    RDMA/mlx5: Fix check for supported user flags when creating a QP

    Linus Torvalds
     
  • Pull power management fixes from Rafael Wysocki:
    "These fix a PM-runtime framework regression introduced by the recent
    switch-over of device autosuspend to hrtimers and a mistake in the
    "poll idle state" code introduced by a recent change in it.

    Specifics:

    - Since ktime_get() turns out to be problematic for device
    autosuspend in the PM-runtime framework, make it use
    ktime_get_mono_fast_ns() instead (Vincent Guittot).

    - Fix an initial value of a local variable in the "poll idle state"
    code that makes it behave not exactly as expected when all idle
    states except for the "polling" one are disabled (Doug Smythies)"

    * tag 'pm-5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    cpuidle: poll_state: Fix default time limit
    PM-runtime: Fix deadlock with ktime_get()

    Linus Torvalds
     

01 Feb, 2019

3 commits

  • In the current code, the codec registration may happen both at the
    codec bind time and the end of the controller probe time. In a rare
    occasion, they race with each other, leading to Oops due to the still
    uninitialized card device.

    This patch introduces a simple flag to prevent the codec registration
    at the codec bind time as long as the controller probe is going on.
    The controller probe invokes snd_card_register() that does the whole
    registration task, and we don't need to register each piece
    beforehand.

    Cc:
    Signed-off-by: Takashi Iwai

    Takashi Iwai
     
  • Pull clk fixes from Stephen Boyd:
    "Mostly driver fixes, but there's a core framework fix in here too:

    - Revert the commits that introduce clk management for the SP clk on
    MMP2 SoCs (used for OLPC). Turns out it wasn't a good idea and
    there isn't any need to manage this clk, it just causes more
    headaches.

    - A performance regression that went unnoticed for many years where
    we would traverse the entire clk tree looking for a clk by name
    when we already have the pointer to said clk that we're looking for

    - A parent linkage fix for the qcom SDM845 clk driver

    - An i.MX clk driver rate miscalculation fix where order of
    operations were messed up

    - One error handling fix from the static checkers"

    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
    clk: qcom: gcc: Use active only source for CPUSS clocks
    clk: ti: Fix error handling in ti_clk_parse_divider_data()
    clk: imx: Fix fractional clock set rate computation
    clk: Remove global clk traversal on fetch parent index
    Revert "dt-bindings: marvell,mmp2: Add clock id for the SP clock"
    Revert "clk: mmp2: add SP clock"
    Revert "Input: olpc_apsp - enable the SP clock"

    Linus Torvalds
     
  • Disabled preemption is necessary for proper access to per-cpu maps
    from BPF programs.

    But the sender side of socket filters didn't have preemption disabled:
    unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN

    and a combination of af_packet with tun device didn't disable either:
    tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue->
    tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN

    Disable preemption before executing BPF programs (both classic and extended).

    Reported-by: Jann Horn
    Signed-off-by: Alexei Starovoitov
    Acked-by: Song Liu
    Signed-off-by: Daniel Borkmann

    Alexei Starovoitov
     

31 Jan, 2019

8 commits

  • There's an issue with how sense requests are handled in IDE. If ide-cd
    encounters an error, it queues a sense request. With how IDE request
    handling is done, this is the next request we need to handle. But it's
    impossible to guarantee this, as another request could come in between
    the sense being queued, and ->queue_rq() being run and handling it. If
    that request ALSO fails, then we attempt to doubly queue the single
    sense request we have.

    Since we only support one active request at the time, defer request
    processing when a sense request is queued.

    Fixes: 600335205b8d "ide: convert to blk-mq"
    Reported-by: He Zhe
    Tested-by: He Zhe
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • According to ARM IHI 0069C (ID070116), we should use GITS_TYPER's
    bits [7:4] as ITT_entry_size instead of [8:4]. Although this is
    pretty annoying, it only results in a potential over-allocation
    of memory, and nothing bad happens.

    Fixes: 3dfa576bfb45 ("irqchip/gic-v3-its: Add probing for VLPI properties")
    Signed-off-by: Zenghui Yu
    [maz: massaged subject and commit message]
    Signed-off-by: Marc Zyngier

    Zenghui Yu
     
  • If we don't have DT then stmmac_clk will not be available. Let's add a
    new Platform Data field so that we can specify the refclk by this mean.

    This way we can still use the coalesce command in PCI based setups.

    Signed-off-by: Jose Abreu
    Cc: Joao Pinto
    Cc: David S. Miller
    Cc: Giuseppe Cavallaro
    Cc: Alexandre Torgue
    Signed-off-by: David S. Miller

    Jose Abreu
     
  • While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
    I ran into the issue that while l3 mode is working fine, l3s mode
    does not have any connectivity to kube-apiserver and hence all pods
    end up in Error state as well. The ipvlan master device sits on
    top of a bond device and hostns traffic to kube-apiserver (also running
    in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
    where the latter is the address of the bond0. While in l3 mode, a
    curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
    works fine from hostns, neither of them do in case of l3s. In the
    latter only a curl to https://127.0.0.1:37573 appeared to work where
    for local addresses of bond0 I saw kernel suddenly starting to emit
    ARP requests to query HW address of bond0 which remained unanswered
    and neighbor entries in INCOMPLETE state. These ARP requests only
    happen while in l3s.

    Debugging this further, I found the issue is that l3s mode is piggy-
    backing on l3 master device, and in this case local routes are using
    l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
    f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
    if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
    a loopback"). I found that reverting them back into using the
    net->loopback_dev fixed ipvlan l3s connectivity and got everything
    working for the CNI.

    Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
    l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
    on l3 master device is to get the l3mdev_ip_rcv() receive hook for
    setting the dst entry of the input route without adding its own
    ipvlan specific hacks into the receive path, however, any l3 domain
    semantics beyond just that are breaking l3s operation. Note that
    ipvlan also has the ability to dynamically switch its internal
    operation from l3 to l3s for all ports via ipvlan_set_port_mode()
    at runtime. In any case, l3 vs l3s soley distinguishes itself by
    'de-confusing' netfilter through switching skb->dev to ipvlan slave
    device late in NF_INET_LOCAL_IN before handing the skb to L4.

    Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
    if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
    without any additional l3mdev semantics on top. This should also have
    minimal impact since dev->priv_flags is already hot in cache. With
    this set, l3s mode is working fine and I also get things like
    masquerading pod traffic on the ipvlan master properly working.

    [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

    Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
    Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
    Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
    Signed-off-by: Daniel Borkmann
    Cc: Mahesh Bandewar
    Cc: David Ahern
    Cc: Florian Westphal
    Cc: Martynas Pumputis
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • A deadlock has been seen when swicthing clocksources which use
    PM-runtime. The call path is:

    change_clocksource
    ...
    write_seqcount_begin
    ...
    timekeeping_update
    ...
    sh_cmt_clocksource_enable
    ...
    rpm_resume
    pm_runtime_mark_last_busy
    ktime_get
    do
    read_seqcount_begin
    while read_seqcount_retry
    ....
    write_seqcount_end

    Although we should be safe because we haven't yet changed the
    clocksource at that time, we can't do that because of seqcount
    protection.

    Use ktime_get_mono_fast_ns() instead which is lock safe for such
    cases.

    With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
    monotonic across an update and as a result can goes backward.
    According to update_fast_timekeeper() description: "In the worst
    case, this can result is a slightly wrong timestamp (a few
    nanoseconds)". For PM-runtime autosuspend, this means only that
    the suspend decision may be slightly suboptimal.

    Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
    Reported-by: Biju Das
    Signed-off-by: Vincent Guittot
    Reviewed-by: Ulf Hansson
    Signed-off-by: Rafael J. Wysocki

    Vincent Guittot
     
  • The current dentry number tracking code doesn't distinguish between
    positive & negative dentries. It just reports the total number of
    dentries in the LRU lists.

    As excessive number of negative dentries can have an impact on system
    performance, it will be wise to track the number of positive and
    negative dentries separately.

    This patch adds tracking for the total number of negative dentries in
    the system LRU lists and reports it in the 5th field in the
    /proc/sys/fs/dentry-state file. The number, however, does not include
    negative dentries that are in flight but not in the LRU yet as well as
    those in the shrinker lists which are on the way out anyway.

    The number of positive dentries in the LRU lists can be roughly found by
    subtracting the number of negative dentries from the unused count.

    Matthew Wilcox had confirmed that since the introduction of the
    dentry_stat structure in 2.1.60, the dummy array was there, probably for
    future extension. They were not replacements of pre-existing fields.
    So no sane applications that read the value of /proc/sys/fs/dentry-state
    will do dummy thing if the last 2 fields of the sysctl parameter are not
    zero. IOW, it will be safe to use one of the dummy array entry for
    negative dentry count.

    Signed-off-by: Waiman Long
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • The list_lru structure is essentially just a pointer to a table of
    per-node LRU lists. Even if CONFIG_MEMCG_KMEM is defined, the list
    field is just used for LRU list registration and shrinker_id is set at
    initialization. Those fields won't need to be touched that often.

    So there is no point to make the list_lru structures to sit in their own
    cachelines.

    Signed-off-by: Waiman Long
    Reviewed-by: Dave Chinner
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • With the following commit:

    73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")

    ... the hotplug code attempted to detect when SMT was disabled by BIOS,
    in which case it reported SMT as permanently disabled. However, that
    code broke a virt hotplug scenario, where the guest is booted with only
    primary CPU threads, and a sibling is brought online later.

    The problem is that there doesn't seem to be a way to reliably
    distinguish between the HW "SMT disabled by BIOS" case and the virt
    "sibling not yet brought online" case. So the above-mentioned commit
    was a bit misguided, as it permanently disabled SMT for both cases,
    preventing future virt sibling hotplugs.

    Going back and reviewing the original problems which were attempted to
    be solved by that commit, when SMT was disabled in BIOS:

    1) /sys/devices/system/cpu/smt/control showed "on" instead of
    "notsupported"; and

    2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.

    I'd propose that we instead consider #1 above to not actually be a
    problem. Because, at least in the virt case, it's possible that SMT
    wasn't disabled by BIOS and a sibling thread could be brought online
    later. So it makes sense to just always default the smt control to "on"
    to allow for that possibility (assuming cpuid indicates that the CPU
    supports SMT).

    The real problem is #2, which has a simple fix: change vmx_vm_init() to
    query the actual current SMT state -- i.e., whether any siblings are
    currently online -- instead of looking at the SMT "control" sysfs value.

    So fix it by:

    a) reverting the original "fix" and its followup fix:

    73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
    bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")

    and

    b) changing vmx_vm_init() to query the actual current SMT state --
    instead of the sysfs control value -- to determine whether the L1TF
    warning is needed. This also requires the 'sched_smt_present'
    variable to exported, instead of 'cpu_smt_control'.

    Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
    Reported-by: Igor Mammedov
    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Thomas Gleixner
    Cc: Joe Mario
    Cc: Jiri Kosina
    Cc: Peter Zijlstra
    Cc: kvm@vger.kernel.org
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com

    Josh Poimboeuf
     

30 Jan, 2019

2 commits

  • …wnguo/linux into arm/fixes

    i.MX fixes for 5.0, 2nd round:

    It contains a single fix for i.MX8MQ clock numbers, removing the
    duplicate use of 232 and numbering the clocks consecutively.

    * tag 'imx-fixes-5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
    dt-bindings: imx8mq: Number clocks consecutively

    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

    Arnd Bergmann
     
  • Pull networking fixes from David Miller:

    1) Need to save away the IV across tls async operations, from Dave
    Watson.

    2) Upon successful packet processing, we should liberate the SKB with
    dev_consume_skb{_irq}(). From Yang Wei.

    3) Only apply RX hang workaround on effected macb chips, from Harini
    Katakam.

    4) Dummy netdev need a proper namespace assigned to them, from Josh
    Elsasser.

    5) Some paths of nft_compat run lockless now, and thus we need to use a
    proper refcnt_t. From Florian Westphal.

    6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.

    7) netrom does not refcount sockets properly wrt. timers, fix that by
    using the sock timer API. From Cong Wang.

    8) Fix locking of inexact inserts of xfrm policies, from Florian
    Westphal.

    9) Missing xfrm hash generation bump, also from Florian.

    10) Missing of_node_put() in hns driver, from Yonglong Liu.

    11) Fix DN_IFREQ_SIZE, from Johannes Berg.

    12) ip6mr notifier is invoked during traversal of wrong table, from Nir
    Dotan.

    13) TX promisc settings not performed correctly in qed, from Manish
    Chopra.

    14) Fix OOB access in vhost, from Jason Wang.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
    MAINTAINERS: Add entry for XDP (eXpress Data Path)
    net: set default network namespace in init_dummy_netdev()
    net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
    net: caif: call dev_consume_skb_any when skb xmit done
    net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: macb: Apply RXUBR workaround only to versions with errata
    net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
    net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
    net: tls: Fix deadlock in free_resources tx
    net: tls: Save iv in tls_rec for async crypto requests
    vhost: fix OOB in get_rx_bufs()
    qed: Fix stack out of bounds bug
    qed: Fix system crash in ll2 xmit
    qed: Fix VF probe failure while FLR
    qed: Fix LACP pdu drops for VFs
    qed: Fix bug in tx promiscuous mode settings
    net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    netfilter: ipt_CLUSTERIP: fix warning unused variable cn
    ...

    Linus Torvalds
     

29 Jan, 2019

2 commits

  • Ring buffer implementation in hid_debug_event() and hid_debug_events_read()
    is strange allowing lost or corrupted data. After commit 717adfdaf147
    ("HID: debug: check length before copy_to_user()") it is possible to enter
    an infinite loop in hid_debug_events_read() by providing 0 as count, this
    locks up a system. Fix this by rewriting the ring buffer implementation
    with kfifo and simplify the code.

    This fixes CVE-2019-3819.

    v2: fix an execution logic and add a comment
    v3: use __set_current_state() instead of set_current_state()

    Link: https://bugzilla.redhat.com/show_bug.cgi?id=1669187
    Cc: stable@vger.kernel.org # v4.18+
    Fixes: cd667ce24796 ("HID: use debugfs for events/reports dumping")
    Fixes: 717adfdaf147 ("HID: debug: check length before copy_to_user()")
    Signed-off-by: Vladis Dronov
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Benjamin Tissoires

    Vladis Dronov
     
  • aead_request_set_crypt takes an iv pointer, and we change the iv
    soon after setting it. Some async crypto algorithms don't save the iv,
    so we need to save it in the tls_rec for async requests.

    Found by hardcoding x64 aesni to use async crypto manager (to test the async
    codepath), however I don't think this combination can happen in the wild.
    Presumably other hardware offloads will need this fix, but there have been
    no user reports.

    Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
    Signed-off-by: Dave Watson
    Signed-off-by: David S. Miller

    Dave Watson
     

28 Jan, 2019

3 commits

  • Pull locking fixes from Thomas Gleixner:
    "A small series of fixes which all address possible missed wakeups:

    - Document and fix the wakeup ordering of wake_q

    - Add the missing barrier in rcuwait_wake_up(), which was documented
    in the comment but missing in the code

    - Fix the possible missed wakeups in the rwsem and futex code"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/rwsem: Fix (possible) missed wakeup
    futex: Fix (possible) missed wakeup
    sched/wake_q: Fix wakeup ordering for wake_q
    sched/wake_q: Document wake_q_add()
    sched/wait: Fix rcuwait_wake_up() ordering

    Linus Torvalds
     
  • Pull irq fixes from Thomas Gleixner:
    "A small set of fixes for the interrupt subsystem:

    - Fix a double increment in the irq descriptor allocator which
    resulted in a sanity check only being done for every second
    affinity mask

    - Add a missing device tree translation in the stm32-exti driver.
    Without that the interrupt association is completely wrong.

    - Initialize the mutex in the GIC-V3 MBI driver

    - Fix the alignment for aliasing devices in the GIC-V3-ITS driver so
    multi MSI allocations work correctly

    - Ensure that the initial affinity of a interrupt is not empty at
    startup time.

    - Drop bogus include in the madera irq chip driver

    - Fix KernelDoc regression"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size
    genirq/irqdesc: Fix double increment in alloc_descs()
    genirq: Fix the kerneldoc comment for struct irq_affinity_desc
    irqchip/madera: Drop GPIO includes
    irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
    irqchip/stm32-exti: Add domain translate function
    genirq: Make sure the initial affinity is not empty

    Linus Torvalds
     
  • Pull dma-mapping fix from Christoph Hellwig:
    "Fix a xen-swiotlb regression on arm64"

    * tag 'dma-mapping-5.0-2' of git://git.infradead.org/users/hch/dma-mapping:
    arm64/xen: fix xen-swiotlb cache flushing

    Linus Torvalds