02 Sep, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) Fix some length checks during OGM processing in batman-adv, from
    Sven Eckelmann.

    2) Fix regression that caused netfilter conntrack sysctls to not be
    per-netns any more. From Florian Westphal.

    3) Use after free in netpoll, from Feng Sun.

    4) Guard destruction of pfifo_fast per-cpu qdisc stats with
    qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed
    in pfifo_fast_enqueue().

    5) Fix memory leak in mld_del_delrec(), from Eric Dumazet.

    6) Handle neigh events on internal ports correctly in nfp, from John
    Hurley.

    7) Clear SKB timestamp in NF flow table code so that it does not
    confuse fq scheduler. From Florian Westphal.

    8) taprio destroy can crash if it is invoked in a failure path of
    taprio_init(), because the list head isn't setup properly yet and
    the list del is unconditional. Perform the list add earlier to
    address this. From Vladimir Oltean.

    9) Make sure to reapply vlan filters on device up, in aquantia driver.
    From Dmitry Bogdanov.

    10) sgiseeq driver releases DMA memory using free_page() instead of
    dma_free_attrs(). From Christophe JAILLET.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
    net: seeq: Fix the function used to release some memory in an error handling path
    enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions
    net: bcmgenet: use ethtool_op_get_ts_info()
    tc-testing: don't hardcode 'ip' in nsPlugin.py
    net: dsa: microchip: add KSZ8563 compatibility string
    dt-bindings: net: dsa: document additional Microchip KSZ8563 switch
    net: aquantia: fix out of memory condition on rx side
    net: aquantia: linkstate irq should be oneshot
    net: aquantia: reapply vlan filters on up
    net: aquantia: fix limit of vlan filters
    net: aquantia: fix removal of vlan 0
    net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
    taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
    taprio: Fix kernel panic in taprio_destroy
    net: dsa: microchip: fill regmap_config name
    rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
    net: stmmac: dwmac-rk: Don't fail if phy regulator is absent
    amd-xgbe: Fix error path in xgbe_mod_init()
    netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder
    mac80211: Correctly set noencrypt for PAE frames
    ...

    Linus Torvalds
     

01 Sep, 2019

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "Small fixes and minor cleanups for tracing:

    - Make exported ftrace function not static

    - Fix NULL pointer dereference in reading probes as they are created

    - Fix NULL pointer dereference in k/uprobe clean up path

    - Various documentation fixes"

    * tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Correct kdoc formats
    ftrace/x86: Remove mcount() declaration
    tracing/probe: Fix null pointer dereference
    tracing: Make exported ftrace_set_clr_event non-static
    ftrace: Check for successful allocation of hash
    ftrace: Check for empty hash and comment the race with registering probes
    ftrace: Fix NULL pointer dereference in t_probe_next()

    Linus Torvalds
     

31 Aug, 2019

4 commits

  • The function ftrace_set_clr_event is declared static and marked
    EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the
    function was decided to be a part of API, this commit removes the static
    attribute and adds the declaration to the header.

    Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com

    Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances")
    Reviewed-by: Joe Jin
    Signed-off-by: Denis Efremov
    Signed-off-by: Steven Rostedt (VMware)

    Denis Efremov
     
  • I've noticed that the "slab" value in memory.stat is sometimes 0, even
    if some children memory cgroups have a non-zero "slab" value. The
    following investigation showed that this is the result of the kmem_cache
    reparenting in combination with the per-cpu batching of slab vmstats.

    At the offlining some vmstat value may leave in the percpu cache, not
    being propagated upwards by the cgroup hierarchy. It means that stats
    on ancestor levels are lower than actual. Later when slab pages are
    released, the precise number of pages is substracted on the parent
    level, making the value negative. We don't show negative values, 0 is
    printed instead.

    To fix this issue, let's flush percpu slab memcg and lruvec stats on
    memcg offlining. This guarantees that numbers on all ancestor levels
    are accurate and match the actual number of outstanding slab pages.

    Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com
    Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
    Signed-off-by: Roman Gushchin
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     
  • David Howells says:

    ====================
    rxrpc: Fix use of skb_cow_data()

    Here's a series of patches that replaces the use of skb_cow_data() in rxrpc
    with skb_unshare() early on in the input process. The problem that is
    being seen is that skb_cow_data() indirectly requires that the maximum
    usage count on an sk_buff be 1, and it may generate an assertion failure in
    pskb_expand_head() if not.

    This can occur because rxrpc_input_data() may be still holding a ref when
    it has just attached the sk_buff to the rx ring and given that attachment
    its own ref. If recvmsg happens fast enough, skb_cow_data() can see the
    ref still held by the softirq handler.

    Further, a packet may contain multiple subpackets, each of which gets its
    own attachment to the ring and its own ref - also making skb_cow_data() go
    bang.

    Fix this by:

    (1) The DATA packet is currently parsed for subpackets twice by the input
    routines. Parse it just once instead and make notes in the sk_buff
    private data.

    (2) Use the notes from (1) when attaching the packet to the ring multiple
    times. Once the packet is attached to the ring, recvmsg can see it
    and start modifying it, so the softirq handler is not permitted to
    look inside it from that point.

    (3) Pass the ref from the input code to the ring rather than getting an
    extra ref. rxrpc_input_data() uses a ref on the second refcount to
    prevent the packet from evaporating under it.

    (4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet()
    before we take call->input_lock. Other sorts of packets don't get
    modified and so can be left.

    A trace is emitted if skb_unshare() eats the skb. Note that
    skb_share() for our accounting in this regard as we can't see the
    parameters in the packet to log in a trace line if it releases it.

    (5) Remove the calls to skb_cow_data(). These are then no longer
    necessary.

    There are also patches to improve the rxrpc_skb tracepoint to make sure
    that Tx-derived buffers are identified separately from Rx-derived buffers
    in the trace.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull ARM SoC fixes from Arnd Bergmann:
    "The majority of the fixes this time are for OMAP hardware, here is a
    breakdown of the significant changes:

    Various device tree bug fixes:
    - TI am57xx boards need a voltage level fix to avoid damaging SD
    cards
    - vf610-bk4 fails to detect its flash due to an incorrect description
    - meson-g12a USB phy configuration fails
    - meson-g12b reboot should not power off the SD card
    - Some corrections for apparently harmless differences from the
    documentation.

    Regression fixes:
    - ams-delta FIQ interrupts broke in 5.3
    - TI am3/am4 mmc controllers broke in 5.2

    The logic_pio driver (used on some Huawei ARM servers) got a few bug
    fixes for reliability.

    And a couple of compile-time warning fixes"

    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
    soc: ixp4xx: Protect IXP4xx SoC drivers by ARCH_IXP4XX || COMPILE_TEST
    soc: ti: pm33xx: Make two symbols static
    soc: ti: pm33xx: Fix static checker warnings
    ARM: OMAP: dma: Mark expected switch fall-throughs
    ARM: dts: Fix incomplete dts data for am3 and am4 mmc
    bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
    ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
    ARM: dts: dra74x: Fix iodelay configuration for mmc3
    ARM: dts: am335x: Fix UARTs length
    ARM: OMAP2+: Fix omap4 errata warning on other SoCs
    bus: hisi_lpc: Add .remove method to avoid driver unbind crash
    bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
    lib: logic_pio: Add logic_pio_unregister_range()
    lib: logic_pio: Avoid possible overlap for unregistering regions
    lib: logic_pio: Fix RCU usage
    arm64: dts: amlogic: odroid-n2: keep SD card regulator always on
    arm64: dts: meson-g12a-sei510: enable IR controller
    arm64: dts: meson-g12a: add missing dwc2 phy-names
    ARM: dts: vf610-bk4: Fix qspi node description
    ARM: dts: Fix incorrect dcan register mapping for am3, am4 and dra7
    ...

    Linus Torvalds
     

30 Aug, 2019

2 commits

  • …kernel/git/gustavoars/linux

    Pull fallthrough fixes from Gustavo A. R. Silva:
    "Fix fall-through warnings on arc and nds32 for multiple
    configurations"

    * tag 'Wimplicit-fallthrough-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
    nds32: Mark expected switch fall-throughs
    ARC: unwind: Mark expected switch fall-through

    Linus Torvalds
     
  • Mark switch cases where we are expecting to fall through.

    This patch fixes the following warnings (Building: allmodconfig nds32):

    include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    arch/nds32/kernel/signal.c:362:20: warning: this statement may fall through [-Wimplicit-fallthrough=]
    arch/nds32/kernel/signal.c:315:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/soft-fp.h:124:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:417:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:430:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:310:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    include/math-emu/op-common.h:320:11: warning: this statement may fall through [-Wimplicit-fallthrough=]

    Reported-by: Michael Ellerman
    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

29 Aug, 2019

2 commits

  • Hisilicon fixes for v5.3-rc

    - Fixed RCU usage in logical PIO
    - Added a function to unregister a logical PIO range in logical PIO
    to support the fixes in the hisi-lpc driver
    - Fixed and optimized hisi-lpc driver to avoid potential use-after-free
    and driver unbind crash

    * tag 'hisi-fixes-for-5.3' of git://github.com/hisilicon/linux-hisi:
    bus: hisi_lpc: Add .remove method to avoid driver unbind crash
    bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
    lib: logic_pio: Add logic_pio_unregister_range()
    lib: logic_pio: Avoid possible overlap for unregistering regions
    lib: logic_pio: Fix RCU usage

    Link: https://lore.kernel.org/r/5D562335.7000902@hisilicon.com
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     
  • Action sample doesn't properly handle psample_group pointer in overwrite
    case. Following issues need to be fixed:

    - In tcf_sample_init() function RCU_INIT_POINTER() is used to set
    s->psample_group, even though we neither setting the pointer to NULL, nor
    preventing concurrent readers from accessing the pointer in some way.
    Use rcu_swap_protected() instead to safely reset the pointer.

    - Old value of s->psample_group is not released or deallocated in any way,
    which results resource leak. Use psample_group_put() on non-NULL value
    obtained with rcu_swap_protected().

    - The function psample_group_put() that released reference to struct
    psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
    grace period when deallocating it. Extend struct psample_group with rcu
    head and use kfree_rcu when freeing it.

    Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov
     

28 Aug, 2019

5 commits

  • Commit 34786005eca3 ("net: phy: prevent PHYs w/o Clause 22 regs from calling
    genphy_config_aneg") introduced a check that aborts phy_config_aneg()
    if the phy is a C45 phy.
    This causes phy_state_machine() to call phy_error() so that the phy
    ends up in PHY_HALTED state.

    Instead of returning -EOPNOTSUPP, call genphy_c45_config_aneg()
    (analogous to the C22 case) so that the state machine can run
    correctly.

    genphy_c45_config_aneg() closely resembles mv3310_config_aneg()
    in drivers/net/phy/marvell10g.c, excluding vendor specific
    configurations for 1000BaseT.

    Fixes: 22b56e827093 ("net: phy: replace genphy_10g_driver with genphy_c45_driver")

    Signed-off-by: Marco Hartmann
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Marco Hartmann
     
  • The net pointer in struct xt_tgdtor_param is not explicitly
    initialized therefore is still NULL when dereferencing it.
    So we have to find a way to pass the correct net pointer to
    ipt_destroy_target().

    The best way I find is just saving the net pointer inside the per
    netns struct tcf_idrinfo, which could make this patch smaller.

    Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
    Reported-and-tested-by: itugrok@yahoo.com
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Stable fixes:

    - Fix a page lock leak in nfs_pageio_resend()

    - Ensure O_DIRECT reports an error if the bytes read/written is 0

    - Don't handle errors if the bind/connect succeeded

    - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was
    invalidat ed"

    Bugfixes:

    - Don't refresh attributes with mounted-on-file information

    - Fix return values for nfs4_file_open() and nfs_finish_open()

    - Fix pnfs layoutstats reporting of I/O errors

    - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort
    for soft I/O errors when the user specifies a hard mount.

    - Various fixes to the error handling in sunrpc

    - Don't report writepage()/writepages() errors twice"

    * tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFS: remove set but not used variable 'mapping'
    NFSv2: Fix write regression
    NFSv2: Fix eof handling
    NFS: Fix writepage(s) error handling to not report errors twice
    NFS: Fix spurious EIO read errors
    pNFS/flexfiles: Don't time out requests on hard mounts
    SUNRPC: Handle connection breakages correctly in call_status()
    Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
    SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
    pNFS/flexfiles: Turn off soft RPC calls
    SUNRPC: Don't handle errors if the bind/connect succeeded
    NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()
    NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
    NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0
    NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend()
    NFSv4: Fix return value in nfs_finish_open()
    NFSv4: Fix return values for nfs4_file_open()
    NFS: Don't refresh attributes with mounted-on-file information

    Linus Torvalds
     
  • Pull ARC updates from Vineet Gupta:

    - support for Edge Triggered IRQs in ARC IDU intc

    - other fixes here and there

    * tag 'arc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
    arc: prefer __section from compiler_attributes.h
    dt-bindings: IDU-intc: Add support for edge-triggered interrupts
    dt-bindings: IDU-intc: Clean up documentation
    ARCv2: IDU-intc: Add support for edge-triggered interrupts
    ARC: unwind: Mark expected switch fall-throughs
    ARC: [plat-hsdk]: allow to switch between AXI DMAC port configurations
    ARC: fix typo in setup_dma_ops log message
    ARCv2: entry: early return from exception need not clear U & DE bits

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya
    Leoshkevich.

    2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for
    SMC from Jason Baron.

    3) ipv6_mc_may_pull() should return 0 for malformed packets, not
    -EINVAL. From Stefano Brivio.

    4) Don't forget to unpin umem xdp pages in error path of
    xdp_umem_reg(). From Ivan Khoronzhuk.

    5) Fix sta object leak in mac80211, from Johannes Berg.

    6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
    switches. From Florian Fainelli.

    7) Revert DMA sync removal from r8169 which was causing regressions on
    some MIPS Loongson platforms. From Heiner Kallweit.

    8) Use after free in flow dissector, from Jakub Sitnicki.

    9) Fix NULL derefs of net devices during ICMP processing across
    collect_md tunnels, from Hangbin Liu.

    10) proto_register() memory leaks, from Zhang Lin.

    11) Set NLM_F_MULTI flag in multipart netlink messages consistently,
    from John Fastabend.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
    r8152: Set memory to all 0xFFs on failed reg reads
    openvswitch: Fix conntrack cache with timeout
    ipv4: mpls: fix mpls_xmit for iptunnel
    nexthop: Fix nexthop_num_path for blackhole nexthops
    net: rds: add service level support in rds-info
    net: route dump netlink NLM_F_MULTI flag missing
    s390/qeth: reject oversized SNMP requests
    sock: fix potential memory leak in proto_register()
    MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
    xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
    ipv4/icmp: fix rt dst dev null pointer dereference
    openvswitch: Fix log message in ovs conntrack
    bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
    bpf: fix use after free in prog symbol exposure
    bpf: fix precision tracking in presence of bpf2bpf calls
    flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
    Revert "r8169: remove not needed call to dma_sync_single_for_device"
    ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
    net/ncsi: Fix the payload copying for the request coming from Netlink
    qed: Add cleanup in qed_slowpath_start()
    ...

    Linus Torvalds
     

27 Aug, 2019

4 commits

  • The in-place decryption routines in AF_RXRPC's rxkad security module
    currently call skb_cow_data() to make sure the data isn't shared and that
    the skb can be written over. This has a problem, however, as the softirq
    handler may be still holding a ref or the Rx ring may be holding multiple
    refs when skb_cow_data() is called in rxkad_verify_packet() - and so
    skb_shared() returns true and __pskb_pull_tail() dislikes that. If this
    occurs, something like the following report will be generated.

    kernel BUG at net/core/skbuff.c:1463!
    ...
    RIP: 0010:pskb_expand_head+0x253/0x2b0
    ...
    Call Trace:
    __pskb_pull_tail+0x49/0x460
    skb_cow_data+0x6f/0x300
    rxkad_verify_packet+0x18b/0xb10 [rxrpc]
    rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc]
    rxrpc_kernel_recv_data+0x126/0x240 [rxrpc]
    afs_extract_data+0x51/0x2d0 [kafs]
    afs_deliver_fs_fetch_data+0x188/0x400 [kafs]
    afs_deliver_to_call+0xac/0x430 [kafs]
    afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs]
    afs_make_call+0x282/0x3f0 [kafs]
    afs_fs_fetch_data+0x164/0x300 [kafs]
    afs_fetch_data+0x54/0x130 [kafs]
    afs_readpages+0x20d/0x340 [kafs]
    read_pages+0x66/0x180
    __do_page_cache_readahead+0x188/0x1a0
    ondemand_readahead+0x17d/0x2e0
    generic_file_read_iter+0x740/0xc10
    __vfs_read+0x145/0x1a0
    vfs_read+0x8c/0x140
    ksys_read+0x4a/0xb0
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fix this by using skb_unshare() instead in the input path for DATA packets
    that have a security index != 0. Non-DATA packets don't need in-place
    encryption and neither do unencrypted DATA packets.

    Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
    Reported-by: Julian Wollrath
    Signed-off-by: David Howells

    David Howells
     
  • Use the previously-added transmit-phase skbuff private flag to simplify the
    socket buffer tracing a bit. Which phase the skbuff comes from can now be
    divined from the skb rather than having to be guessed from the call state.

    We can also reduce the number of rxrpc_skb_trace values by eliminating the
    difference between Tx and Rx in the symbols.

    Signed-off-by: David Howells

    David Howells
     
  • This reverts commit a79f194aa4879e9baad118c3f8bb2ca24dbef765.
    The mechanism for aborting I/O is racy, since we are not guaranteed that
    the request is asleep while we're changing both task->tk_status and
    task->tk_action.

    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org # v5.1

    Trond Myklebust
     
  • This adds support for an optional extra interrupt cell to specify edge
    vs level triggered. It is backward compatible with dts files with only
    one cell, and will default to level-triggered in such a case.

    Note that I had to make a change to idu_irq_set_affinity as well, as
    this function was setting the interrupt type to "level" unconditionally,
    since this was the only type supported previously.

    Signed-off-by: Mischa Jonker
    Reviewed-by: Vineet Gupta
    Signed-off-by: Vineet Gupta

    Mischa Jonker
     

26 Aug, 2019

3 commits

  • Donald reported this sequence:
    ip next add id 1 blackhole
    ip next add id 2 blackhole
    ip ro add 1.1.1.1/32 nhid 1
    ip ro add 1.1.1.2/32 nhid 2

    would cause a crash. Backtrace is:

    [ 151.302790] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
    [ 151.304043] CPU: 1 PID: 277 Comm: ip Not tainted 5.3.0-rc5+ #37
    [ 151.305078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
    [ 151.306526] RIP: 0010:fib_add_nexthop+0x8b/0x2aa
    [ 151.307343] Code: 35 f7 81 48 8d 14 01 c7 02 f1 f1 f1 f1 c7 42 04 01 f4 f4 f4 48 89 f2 48 c1 ea 03 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 3c 02 00 74 08 48 89 f7 e8 1a e8 53 ff be 08 00 00 00 4c 89 e7
    [ 151.310549] RSP: 0018:ffff888116c27340 EFLAGS: 00010246
    [ 151.311469] RAX: dffffc0000000000 RBX: ffff8881154ece00 RCX: 0000000000000000
    [ 151.312713] RDX: 0000000000000004 RSI: 0000000000000020 RDI: ffff888115649b40
    [ 151.313968] RBP: ffff888116c273d8 R08: ffffed10221e3757 R09: ffff888110f1bab8
    [ 151.315212] R10: 0000000000000001 R11: ffff888110f1bab3 R12: ffff888115649b40
    [ 151.316456] R13: 0000000000000020 R14: ffff888116c273b0 R15: ffff888115649b40
    [ 151.317707] FS: 00007f60b4d8d800(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
    [ 151.319113] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 151.320119] CR2: 0000555671ffdc00 CR3: 00000001136ba005 CR4: 0000000000020ee0
    [ 151.321367] Call Trace:
    [ 151.321820] ? fib_nexthop_info+0x635/0x635
    [ 151.322572] fib_dump_info+0xaa4/0xde0
    [ 151.323247] ? fib_create_info+0x2431/0x2431
    [ 151.324008] ? napi_alloc_frag+0x2a/0x2a
    [ 151.324711] rtmsg_fib+0x2c4/0x3be
    [ 151.325339] fib_table_insert+0xe2f/0xeee
    ...

    fib_dump_info incorrectly has nhs = 0 for blackhole nexthops, so it
    believes the nexthop object is a multipath group (nhs != 1) and ends
    up down the nexthop_mpath_fill_node() path which is wrong for a
    blackhole.

    The blackhole check in nexthop_num_path is leftover from early days
    of the blackhole implementation which did not initialize the device.
    In the end the design was simpler (fewer special case checks) to set
    the device to loopback in nh_info, so the check in nexthop_num_path
    should have been removed.

    Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
    Reported-by: Donald Sharp
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Pull UBIFS and JFFS2 fixes from Richard Weinberger:
    "UBIFS:
    - Don't block too long in writeback_inodes_sb()
    - Fix for a possible overrun of the log head
    - Fix double unlock in orphan_delete()

    JFFS2:
    - Remove C++ style from UAPI header and unbreak picky toolchains"

    * tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
    ubifs: Limit the number of pages in shrink_liability
    ubifs: Correctly initialize c->min_log_bytes
    ubifs: Fix double unlock around orphan_delete()
    jffs2: Remove C++ style comments from uapi header

    Linus Torvalds
     
  • Pull timekeeping fix from Thomas Gleixner:
    "A single fix for a regression caused by the generic VDSO
    implementation where a math overflow causes CLOCK_BOOTTIME to become a
    random number generator"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timekeeping/vsyscall: Prevent math overflow in BOOTTIME update

    Linus Torvalds
     

25 Aug, 2019

4 commits

  • Pull dma-mapping fixes from Christoph Hellwig:
    "Two fixes for regressions in this merge window:

    - select the Kconfig symbols for the noncoherent dma arch helpers on
    arm if swiotlb is selected, not just for LPAE to not break then Xen
    build, that uses swiotlb indirectly through swiotlb-xen

    - fix the page allocator fallback in dma_alloc_contiguous if the CMA
    allocation fails"

    * tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping:
    dma-direct: fix zone selection after an unaddressable CMA allocation
    arm: select the dma-noncoherent symbols for all swiotlb builds

    Linus Torvalds
     
  • >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
    is used to identify different flows within an IBA subnet.
    It is carried in the local route header of the packet.

    Before this commit, run "rds-info -I". The outputs are as
    below:
    "
    RDS IB Connections:
    LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
    192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    "
    After this commit, the output is as below:
    "
    RDS IB Connections:
    LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
    192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9
    192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
    "

    The commit fe3475af3bdf ("net: rds: add per rds connection cache
    statistics") adds cache_allocs in struct rds_info_rdma_connection
    as below:
    struct rds_info_rdma_connection {
    ...
    __u32 rdma_mr_max;
    __u32 rdma_mr_size;
    __u8 tos;
    __u32 cache_allocs;
    };
    The peer struct in rds-tools of struct rds_info_rdma_connection is as
    below:
    struct rds_info_rdma_connection {
    ...
    uint32_t rdma_mr_max;
    uint32_t rdma_mr_size;
    uint8_t tos;
    uint8_t sl;
    uint32_t cache_allocs;
    };
    The difference between userspace and kernel is the member variable sl.
    In the kernel struct, the member variable sl is missing. This will
    introduce risks. So it is necessary to use this commit to avoid this risk.

    Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
    CC: Joe Jin
    CC: JUNXIAO_BI
    Suggested-by: Gerd Rausch
    Signed-off-by: Zhu Yanjun
    Acked-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Zhu Yanjun
     
  • An excerpt from netlink(7) man page,

    In multipart messages (multiple nlmsghdr headers with associated payload
    in one byte stream) the first and all following headers have the
    NLM_F_MULTI flag set, except for the last header which has the type
    NLMSG_DONE.

    but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
    FIB dump. The result is user space applications following above man page
    excerpt may get confused and may stop parsing msg believing something went
    wrong.

    In the golang netlink lib [0] the library logic stops parsing believing the
    message is not a multipart message. Found this running Cilium[1] against
    net-next while adding a feature to auto-detect routes. I noticed with
    multiple route tables we no longer could detect the default routes on net
    tree kernels because the library logic was not returning them.

    Fix this by handling the fib_dump_info_fnhe() case the same way the
    fib_dump_info() handles it by passing the flags argument through the
    call chain and adding a flags argument to rt_fill_info().

    Tested with Cilium stack and auto-detection of routes works again. Also
    annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
    NLMSG_DONE flags look correct after this.

    Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
    as is done for fib_dump_info() so this looks correct to me.

    [0] https://github.com/vishvananda/netlink/
    [1] https://github.com/cilium/

    Fixes: ee28906fd7a14 ("ipv4: Dump route exceptions if requested")
    Signed-off-by: John Fastabend
    Reviewed-by: Stefano Brivio
    Signed-off-by: David S. Miller

    John Fastabend
     
  • Pull GPIO fixes from Linus Walleij:
    "Here is a (hopefully last) set of GPIO fixes for the v5.3 kernel
    cycle. Two are pretty core:

    - Fix not reporting open drain/source lines to userspace as "input"

    - Fix a minor build error found in randconfigs

    - Fix a chip select quirk on the Freescale SPI

    - Fix the irqchip initialization semantic order to reflect what it
    was using the old API"

    * tag 'gpio-v5.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
    gpio: Fix irqchip initialization order
    gpio: of: fix Freescale SPI CS quirk handling
    gpio: Fix build error of function redefinition
    gpiolib: never report open-drain/source lines as 'input' to user-space

    Linus Torvalds
     

24 Aug, 2019

2 commits

  • Pull rdma fixes from Doug Ledford:
    "No beating around the bush: this is a monster pull request for an -rc5
    kernel. Intel hit me with a series of fixes for TID processing.
    Mellanox hit me with a series for their UMR memory support.

    And we had one fix for siw that fixes the 32bit build warnings and
    because of the number of casts that had to be changed to properly
    silence the warnings, that one patch alone is a full 40% of the LOC of
    this entire pull request. Given that this is the initial release
    kernel for siw, I'm trying to fix anything in it that we can, so that
    adds to the impetus to take fixes for it like this one.

    I had to do a rebase early in the week. Jason had thought he put a
    patch on the rc queue that he needed to be there so he could base some
    work off of it, and it had actually not been placed there. So he asked
    me (on Tuesday) to fix that up before pushing my wip branch to the
    official rc branch. I did, and that's why the early patches look like
    they were all committed at the same time on Tuesday. That bunch had
    been in my queue prior.

    The various patches all pass my test for being legitimate fixes and
    not attempts to slide new features or development into a late rc.
    Well, they were all fixes with the exception of a couple clean up
    patches people wrote for making the fixes they also wrote better (like
    a cleanup patch to move UMR checking into a function so that the
    remaining UMR fix patches can reference that function), so I left
    those in place too.

    My apologies for the LOC count and the number of patches here, it's
    just how the cards fell this cycle.

    Summary:

    - Fix siw buffer mapping issue

    - Fix siw 32/64 casting issues

    - Fix a KASAN access issue in bnxt_re

    - Fix several memory leaks (hfi1, mlx4)

    - Fix a NULL deref in cma_cleanup

    - Fixes for UMR memory support in mlx5 (4 patch series)

    - Fix namespace check for restrack

    - Fixes for counter support

    - Fixes for hfi1 TID processing (5 patch series)

    - Fix potential NULL deref in siw

    - Fix memory page calculations in mlx5"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (21 commits)
    RDMA/siw: Fix 64/32bit pointer inconsistency
    RDMA/siw: Fix SGL mapping issues
    RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message
    infiniband: hfi1: fix memory leaks
    infiniband: hfi1: fix a memory leak bug
    IB/mlx4: Fix memory leaks
    RDMA/cma: fix null-ptr-deref Read in cma_cleanup
    IB/mlx5: Block MR WR if UMR is not possible
    IB/mlx5: Fix MR re-registration flow to use UMR properly
    IB/mlx5: Report and handle ODP support properly
    IB/mlx5: Consolidate use_umr checks into single function
    RDMA/restrack: Rewrite PID namespace check to be reliable
    RDMA/counters: Properly implement PID checks
    IB/core: Fix NULL pointer dereference when bind QP to counter
    IB/hfi1: Drop stale TID RDMA packets that cause TIDErr
    IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet
    IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet
    IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet
    IB/hfi1: Drop stale TID RDMA packets
    RDMA/siw: Fix potential NULL de-ref
    ...

    Linus Torvalds
     
  • Pull ceph fixes from Ilya Dryomov:
    "Three important fixes tagged for stable (an indefinite hang, a crash
    on an assert and a NULL pointer dereference) plus a small series from
    Luis fixing instances of vfree() under spinlock"

    * tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client:
    libceph: fix PG split vs OSD (re)connect race
    ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
    ceph: clear page dirty before invalidate page
    ceph: fix buffer free while holding i_ceph_lock in fill_inode()
    ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()
    ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()
    libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer

    Linus Torvalds
     

23 Aug, 2019

1 commit

  • The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the
    nanoseconds based boot time offset left by the clocksource shift. That
    overflows once the boot time offset becomes large enough. As a consequence
    CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to
    misbehave.

    Fix it by storing a timespec64 representation of the offset when boot time
    is adjusted and add that to the MONOTONIC base time value in the vdso data
    page. Using the timespec64 representation avoids a 64bit division in the
    update code.

    Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
    Reported-by: Chris Clayton
    Signed-off-by: Thomas Gleixner
    Tested-by: Chris Clayton
    Tested-by: Vincenzo Frascino
    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908221257580.1983@nanos.tec.linutronix.de

    Thomas Gleixner
     

22 Aug, 2019

3 commits


21 Aug, 2019

2 commits

  • The new dma_alloc_contiguous hides if we allocate CMA or regular
    pages, and thus fails to retry a ZONE_NORMAL allocation if the CMA
    allocation succeeds but isn't addressable. That means we either fail
    outright or dip into a small zone that might not succeed either.

    Thanks to Hillf Danton for debugging this issue.

    Fixes: b1d2dc009dec ("dma-contiguous: add dma_{alloc,free}_contiguous() helpers")
    Reported-by: Tobias Klausmann
    Signed-off-by: Christoph Hellwig
    Tested-by: Tobias Klausmann

    Christoph Hellwig
     
  • task_active_pid_ns() is wrong API to check PID namespace because it
    posses some restrictions and return PID namespace where the process
    was allocated. It created mismatches with current namespace, which
    can be different.

    Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable
    results without any relation to allocated PID namespace.

    Fixes: 8be565e65fa9 ("RDMA/nldev: Factor out the PID namespace check")
    Fixes: 6a6c306a09b5 ("RDMA/restrack: Make is_visible_in_pid_ns() as an API")
    Reviewed-by: Mark Zhang
    Signed-off-by: Leon Romanovsky
    Link: https://lore.kernel.org/r/20190815083834.9245-4-leon@kernel.org
    Signed-off-by: Doug Ledford

    Leon Romanovsky
     

20 Aug, 2019

5 commits

  • Commit ba5ea614622d ("bridge: simplify ip_mc_check_igmp() and
    ipv6_mc_check_mld() calls") replaces direct calls to pskb_may_pull()
    in br_ipv6_multicast_mld2_report() with calls to ipv6_mc_may_pull(),
    that returns -EINVAL on buffers too short to be valid IPv6 packets,
    while maintaining the previous handling of the return code.

    This leads to the direct opposite of the intended effect: if the
    packet is malformed, -EINVAL evaluates as true, and we'll happily
    proceed with the processing.

    Return 0 if the packet is too short, in the same way as this was
    fixed for IPv4 by commit 083b78a9ed64 ("ip: fix ip_mc_may_pull()
    return value").

    I don't have a reproducer for this, unlike the one referred to by
    the IPv4 commit, but this is clearly broken.

    Fixes: ba5ea614622d ("bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls")
    Signed-off-by: Stefano Brivio
    Acked-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Stefano Brivio
     
  • …iederm/user-namespace

    Pull kernel thread signal handling fix from Eric Biederman:
    "I overlooked the fact that kernel threads are created with all signals
    set to SIG_IGN, and accidentally caused a regression in cifs and drbd
    when replacing force_sig with send_sig.

    This is my fix for that regression. I add a new function
    allow_kernel_signal which allows kernel threads to receive signals
    sent from the kernel, but continues to ignore all signals sent from
    userspace. This ensures the user space interface for cifs and drbd
    remain the same.

    These kernel threads depend on blocking networking calls which block
    until something is received or a signal is pending. Making receiving
    of signals somewhat necessary for these kernel threads.

    Perhaps someday we can cleanup those interfaces and remove
    allow_kernel_signal. If not allow_kernel_signal is pretty trivial and
    clearly documents what is going on so I don't think we will mind
    carrying it"

    * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    signal: Allow cifs and drbd to receive their terminating signals

    Linus Torvalds
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Remove IP MASQUERADING record in MAINTAINERS file,
    from Denis Efremov.

    2) Counter arguments are swapped in ebtables, from
    Todd Seidelmann.

    3) Missing netlink attribute validation in flow_offload
    extension.

    4) Incorrect alignment in xt_nfacct that breaks 32-bits
    userspace / 64-bits kernels, from Juliana Rodrigueiro.

    5) Missing include guard in nf_conntrack_h323_types.h,
    from Masahiro Yamada.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:

    1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov.

    2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan.

    3) Fix severe performance regression due to lack of SKB coalescing of
    fragments during local delivery, from Guillaume Nault.

    4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk.

    5) Fix batched events in skbedit packet action, from Roman Mashak.

    6) Propagate VLAN TX offload to hw_enc_features in bond and team
    drivers, from Yue Haibing.

    7) RXRPC local endpoint refcounting fix and read after free in
    rxrpc_queue_local(), from David Howells.

    8) Fix endian bug in ibmveth multicast list handling, from Thomas
    Falcon.

    9) Oops, make nlmsg_parse() wrap around the correct function,
    __nlmsg_parse not __nla_parse(). Fix from David Ahern.

    10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin.

    11) Fix memory leak in cxgb4, from Wenwen Wang.

    12) Yet another race in AF_PACKET, from Eric Dumazet.

    13) Fix false detection of retransmit failures in tipc, from Tuong
    Lien.

    14) Use after free in ravb_tstamp_skb, from Tho Vu.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
    ravb: Fix use-after-free ravb_tstamp_skb
    netfilter: nf_tables: map basechain priority to hardware priority
    net: sched: use major priority number as hardware priority
    wimax/i2400m: fix a memory leak bug
    net: cavium: fix driver name
    ibmvnic: Unmap DMA address of TX descriptor buffers after use
    bnxt_en: Fix to include flow direction in L2 key
    bnxt_en: Use correct src_fid to determine direction of the flow
    bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
    bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
    bnxt_en: Improve RX doorbell sequence.
    bnxt_en: Fix VNIC clearing logic for 57500 chips.
    net: kalmia: fix memory leaks
    cx82310_eth: fix a memory leak bug
    bnx2x: Fix VF's VLAN reconfiguration in reload.
    Bluetooth: Add debug setting for changing minimum encryption key size
    tipc: fix false detection of retransmit failures
    lan78xx: Fix memory leaks
    MAINTAINERS: r8169: Update path to the driver
    MAINTAINERS: PHY LIBRARY: Update files in the record
    ...

    Linus Torvalds
     
  • The maximum key description size is 4095. Commit f771fde82051 ("keys:
    Simplify key description management") inadvertantly reduced that to 255
    and made sizes between 256 and 4095 work weirdly, and any size whereby
    size & 255 == 0 would cause an assertion in __key_link_begin() at the
    following line:

    BUG_ON(index_key->desc_len == 0);

    This can be fixed by simply increasing the size of desc_len in struct
    keyring_index_key to a u16.

    Note the argument length test in keyutils only checked empty
    descriptions and descriptions with a size around the limit (ie. 4095)
    and not for all the values in between, so it missed this. This has been
    addressed and

    https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/commit/?id=066bf56807c26cd3045a25f355b34c1d8a20a5aa

    now exhaustively tests all possible lengths of type, description and
    payload and then some.

    The assertion failure looks something like:

    kernel BUG at security/keys/keyring.c:1245!
    ...
    RIP: 0010:__key_link_begin+0x88/0xa0
    ...
    Call Trace:
    key_create_or_update+0x211/0x4b0
    __x64_sys_add_key+0x101/0x200
    do_syscall_64+0x5b/0x1e0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    It can be triggered by:

    keyctl add user "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" a @s

    Fixes: f771fde82051 ("keys: Simplify key description management")
    Reported-by: kernel test robot
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

19 Aug, 2019

1 commit