16 Oct, 2019

17 commits

  • When a application sends many packets with the same txtime, they may
    be transmitted out of order (different from the order in which they
    were enqueued).

    This happens because when inserting elements into the tree, when the
    txtime of two packets are the same, the new packet is inserted at the
    left side of the tree, causing the reordering. The only effect of this
    change should be that packets with the same txtime will be transmitted
    in the order they are enqueued.

    The application in question (the AVTP GStreamer plugin, still in
    development) is sending video traffic, in which each video frame have
    a single presentation time, the problem is that when packetizing,
    multiple packets end up with the same txtime.

    The receiving side was rejecting packets because they were being
    received out of order.

    Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc")
    Reported-by: Ederson de Souza
    Signed-off-by: Vinicius Costa Gomes
    Signed-off-by: David S. Miller

    Vinicius Costa Gomes
     
  • tc_ctl_action() has the ability to loop forever if tcf_action_add()
    returns -EAGAIN.

    This special case has been done in case a module needed to be loaded,
    but it turns out that tcf_add_notify() could also return -EAGAIN
    if the socket sk_rcvbuf limit is hit.

    We need to separate the two cases, and only loop for the module
    loading case.

    While we are at it, add a limit of 10 attempts since unbounded
    loops are always scary.

    syzbot repro was something like :

    socket(PF_NETLINK, SOCK_RAW|SOCK_NONBLOCK, NETLINK_ROUTE) = 3
    write(3, ..., 38) = 38
    setsockopt(3, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
    sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{..., 388}], msg_controllen=0, msg_flags=0x10}, ...)

    NMI backtrace for cpu 0
    CPU: 0 PID: 1054 Comm: khungtaskd Not tainted 5.4.0-rc1+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
    nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
    arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
    trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
    check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
    watchdog+0x9d0/0xef0 kernel/hung_task.c:289
    kthread+0x361/0x430 kernel/kthread.c:255
    ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 8859 Comm: syz-executor910 Not tainted 5.4.0-rc1+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:751 [inline]
    RIP: 0010:lockdep_hardirqs_off+0x1df/0x2e0 kernel/locking/lockdep.c:3453
    Code: 5c 08 00 00 5b 41 5c 41 5d 5d c3 48 c7 c0 58 1d f3 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d3 00 00 00 83 3d 21 9e 99 07 00 0f 84 b9 00 00 00 9c 58 0f 1f 44 00 00 f6
    RSP: 0018:ffff8880a6f3f1b8 EFLAGS: 00000046
    RAX: 1ffffffff11e63ab RBX: ffff88808c9c6080 RCX: 0000000000000000
    RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88808c9c6914
    RBP: ffff8880a6f3f1d0 R08: ffff88808c9c6080 R09: fffffbfff16be5d1
    R10: fffffbfff16be5d0 R11: 0000000000000003 R12: ffffffff8746591f
    R13: ffff88808c9c6080 R14: ffffffff8746591f R15: 0000000000000003
    FS: 00000000011e4880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffff600400 CR3: 00000000a8920000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    trace_hardirqs_off+0x62/0x240 kernel/trace/trace_preemptirq.c:45
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
    _raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
    __wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122
    __wake_up+0xe/0x10 kernel/sched/wait.c:142
    netlink_unlock_table net/netlink/af_netlink.c:466 [inline]
    netlink_unlock_table net/netlink/af_netlink.c:463 [inline]
    netlink_broadcast_filtered+0x705/0xb80 net/netlink/af_netlink.c:1514
    netlink_broadcast+0x3a/0x50 net/netlink/af_netlink.c:1534
    rtnetlink_send+0xdd/0x110 net/core/rtnetlink.c:714
    tcf_add_notify net/sched/act_api.c:1343 [inline]
    tcf_action_add+0x243/0x370 net/sched/act_api.c:1362
    tc_ctl_action+0x3b5/0x4bc net/sched/act_api.c:1410
    rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5386
    netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5404
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg+0xd7/0x130 net/socket.c:657
    ___sys_sendmsg+0x803/0x920 net/socket.c:2311
    __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
    __do_sys_sendmsg net/socket.c:2365 [inline]
    __se_sys_sendmsg net/socket.c:2363 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
    do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x440939

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot+cf0adbb9c28c8866c788@syzkaller.appspotmail.com
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This patch corrects the SPDX License Identifier style
    in header files related to Distributed Switch Architecture
    drivers for NXP SJA1105 series Ethernet switch support.
    It uses an expilict block comment for the SPDX License
    Identifier.

    Changes made by using a script provided by Joe Perches here:
    https://lkml.org/lkml/2019/2/7/46.

    Suggested-by: Joe Perches
    Signed-off-by: Nishad Kamdar
    Reviewed-by: Andrew Lunn
    Signed-off-by: David S. Miller

    Nishad Kamdar
     
  • syzbot found that if __inet_inherit_port() returns an error,
    we call tcp_done() after inet_csk_prepare_forced_close(),
    meaning the socket lock is no longer held.

    We might fix this in a different way in net-next, but
    for 5.4 it seems safer to relax the lockdep check.

    Fixes: d983ea6f16b8 ("tcp: add rcu protection around tp->fastopen_rsk")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • MarkLee says:

    ====================
    Update MT7629 to support PHYLINK API

    This patch set has two goals :
    1. Fix mt7629 GMII mode issue after apply mediatek
    PHYLINK support patch.
    2. Update mt7629 dts to reflect the latest dt-binding
    with PHYLINK support.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • * Removes mediatek,physpeed property from dtsi that is useless in PHYLINK
    * Use the fixed-link property speed = to set the phy in 2.5Gbit.
    * Set gmac1 to gmii mode that connect to a internal gphy

    Signed-off-by: MarkLee
    Signed-off-by: David S. Miller

    MarkLee
     
  • In the original design, mtk_phy_connect function will set ge_mode=1
    if phy-mode is GMII(PHY_INTERFACE_MODE_GMII) and then set the correct
    ge_mode to ETHSYS_SYSCFG0 register. This logic was broken after apply
    mediatek PHYLINK patch(Fixes tag), the new mtk_mac_config function will
    not set ge_mode=1 for GMII mode hence the final ETHSYS_SYSCFG0 setting
    will be incorrect for mt7629 GMII mode. This patch add the missing logic
    back to fix it.

    Fixes: b8fc9f30821e ("net: ethernet: mediatek: Add basic PHYLINK support")
    Signed-off-by: MarkLee
    Signed-off-by: David S. Miller

    MarkLee
     
  • Davide Caratti says:

    ====================
    net/sched: fix wrong behavior of MPLS push/pop action

    this series contains two fixes for TC 'act_mpls', that try to address
    two problems that can be observed configuring simple 'push' / 'pop'
    operations:
    - patch 1/2 avoids dropping non-MPLS packets that pass through the MPLS
    'pop' action.
    - patch 2/2 fixes corruption of the L2 header that occurs when 'push'
    or 'pop' actions are configured in TC egress path.

    v2: - change commit message in patch 1/2 to better describe that the
    patch impacts only TC, thanks to Simon Horman
    - fix missing documentation of 'mac_len' in patch 2/2
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • the following script:

    # tc qdisc add dev eth0 clsact
    # tc filter add dev eth0 egress protocol ip matchall \
    > action mpls push protocol mpls_uc label 0x355aa bos 1

    causes corruption of all IP packets transmitted by eth0. On TC egress, we
    can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
    operation will result in an overwrite of the first 4 octets in the packet
    L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
    error pattern is present also in the MPLS 'pop' operation. Fix this error
    in act_mpls data plane, computing 'mac_len' as the difference between the
    network header and the mac header (when not at TC ingress), and use it in
    MPLS 'push'/'pop' core functions.

    v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
    in skb_mpls_pop(), reported by kbuild test robot

    CC: Lorenzo Bianconi
    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Reviewed-by: Simon Horman
    Acked-by: John Hurley
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • the following script:

    # tc qdisc add dev eth0 clsact
    # tc filter add dev eth0 egress matchall action mpls pop

    implicitly makes the kernel drop all packets transmitted by eth0, if they
    don't have a MPLS header. This behavior is uncommon: other encapsulations
    (like VLAN) just let the packet pass unmodified. Since the result of MPLS
    'pop' operation would be the same regardless of the presence / absence of
    MPLS header(s) in the original packet, we can let skb_mpls_pop() return 0
    when dealing with non-MPLS packets.

    For the OVS use-case, this is acceptable because __ovs_nla_copy_actions()
    already ensures that MPLS 'pop' operation only occurs with packets having
    an MPLS Ethernet type (and there are no other callers in current code, so
    the semantic change should be ok).

    v2: better documentation of use-cases for skb_mpls_pop(), thanks to Simon
    Horman

    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Reviewed-by: Simon Horman
    Acked-by: John Hurley
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • This patch corrects the SPDX License Identifier style
    in header files related to Cavium Ethernet drivers.
    For C header files Documentation/process/license-rules.rst
    mandates C-like comments (opposed to C source files where
    C++ style should be used)

    Changes made by using a script provided by Joe Perches here:
    https://lkml.org/lkml/2019/2/7/46.

    Suggested-by: Joe Perches
    Signed-off-by: Nishad Kamdar
    Signed-off-by: David S. Miller

    Nishad Kamdar
     
  • This patch corrects the SPDX License Identifier style
    in header files related to Distributed Switch Architecture
    drivers for Microchip KSZ series switch support.
    For C header files Documentation/process/license-rules.rst
    mandates C-like comments (opposed to C source files where
    C++ style should be used)

    Changes made by using a script provided by Joe Perches here:
    https://lkml.org/lkml/2019/2/7/46.

    Suggested-by: Joe Perches
    Signed-off-by: Nishad Kamdar
    Signed-off-by: David S. Miller

    Nishad Kamdar
     
  • NET_VENDOR_BROADCOM is intended to control a kconfig menu only.
    It should not have anything to do with code generation.
    As such, it should not select DIMLIB for all drivers under
    NET_VENDOR_BROADCOM. Instead each driver that needs DIMLIB should
    select it (being the symbols SYSTEMPORT, BNXT, and BCMGENET).

    Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907021810220.13058@ramsan.of.borg/

    Fixes: 4f75da3666c0 ("linux/dim: Move implementation to .c files")
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Randy Dunlap
    Cc: Uwe Kleine-König
    Cc: Tal Gilboa
    Cc: Saeed Mahameed
    Cc: netdev@vger.kernel.org
    Cc: linux-rdma@vger.kernel.org
    Cc: "David S. Miller"
    Cc: Jakub Kicinski
    Cc: Doug Ledford
    Cc: Jason Gunthorpe
    Cc: Leon Romanovsky
    Cc: Or Gerlitz
    Cc: Sagi Grimberg
    Acked-by: Florian Fainelli
    Reviewed-by: Leon Romanovsky
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • Use my kernel.org address for all entries in MAINTAINERS.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • phydev->dev_flags is entirely dependent on the PHY device driver which
    is going to be used, setting the internal GENET PHY revision in those
    bits only makes sense when drivers/net/phy/bcm7xxx.c is the PHY driver
    being used.

    Fixes: 487320c54143 ("net: bcmgenet: communicate integrated PHY revision to PHY driver")
    Signed-off-by: Florian Fainelli
    Acked-by: Doug Berger
    Signed-off-by: David S. Miller

    Florian Fainelli
     
  • While invalidating the dst, we assign backhole_netdev instead of
    loopback device. However, this device does not have idev pointer
    and hence no ip6_ptr even if IPv6 is enabled. Possibly this has
    triggered the syzbot reported crash.

    The syzbot report does not have reproducer, however, this is the
    only device that doesn't have matching idev created.

    Crash instruction is :

    static inline bool ip6_ignore_linkdown(const struct net_device *dev)
    {
    const struct inet6_dev *idev = __in6_dev_get(dev);

    return !!idev->cnf.ignore_routes_with_linkdown;

    Mahesh Bandewar
     
  • …m/linux/kernel/git/kvalo/wireless-drivers

    Kalle Valo says:

    ====================
    wireless-drivers fixes for 5.4

    Second set of fixes for 5.4. ath10k regression and iwlwifi BAD_COMMAND
    bug are the ones getting most reports at the moment.

    ath10k

    * fix throughput regression on QCA98XX

    iwlwifi

    * fix initialization of 3168 devices (the infamous BAD_COMMAND bug)

    * other smaller fixes

    rt2x00

    * don't include input-polldev.h header

    * fix hw reset to work during first 5 minutes of system run
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

15 Oct, 2019

5 commits

  • Igor Russkikh says:

    ====================
    Aquantia/Marvell AQtion atlantic driver fixes 10/2019

    Here is a set of various bugfixes, to be considered for stable as well.

    V2: double space removed
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • macvlan and multicast handling is now mixed up.
    The explicit issue is that macvlan interface gets broken (no traffic)
    after clearing MULTICAST flag on the real interface.

    We now do separate logic and consider both ALLMULTI and MULTICAST
    flags on the device.

    Fixes: 11ba961c9161 ("net: aquantia: Fix IFF_ALLMULTI flag functionality")
    Signed-off-by: Dmitry Bogdanov
    Signed-off-by: Igor Russkikh
    Signed-off-by: David S. Miller

    Dmitry Bogdanov
     
  • Individual descriptors on LRO TCP session should be checked
    for CRC errors. It was discovered that HW recalculates
    L4 checksums on LRO session and does not break it up on bad L4
    csum.

    Thus, driver should aggregate HW LRO L4 statuses from all individual
    buffers of LRO session and drop packet if one of the buffers has bad
    L4 checksum.

    Fixes: f38f1ee8aeb2 ("net: aquantia: check rx csum for all packets in LRO session")
    Signed-off-by: Dmitry Bogdanov
    Signed-off-by: Igor Russkikh
    Signed-off-by: David S. Miller

    Dmitry Bogdanov
     
  • >From HW specification to correctly reset HW caches (this is a required
    workaround when stopping the device), register bit should actually
    be toggled.

    It was previosly always just set. Due to the way driver stops HW this
    never actually caused any issues, but it still may, so cleaning this up.

    Fixes: 7a1bb49461b1 ("net: aquantia: fix potential IOMMU fault after driver unbind")
    Signed-off-by: Igor Russkikh
    Signed-off-by: David S. Miller

    Igor Russkikh
     
  • Chip temperature is a two byte word, colocated internally with cable
    length data. We do all readouts from HW memory by dwords, thus
    we should clear extra high bytes, otherwise temperature output
    gets weird as soon as we attach a cable to the NIC.

    Fixes: 8f8940118654 ("net: aquantia: add infrastructure to readout chip temperature")
    Tested-by: Holger Hoffstätte
    Signed-off-by: Igor Russkikh
    Signed-off-by: David S. Miller

    Igor Russkikh
     

14 Oct, 2019

14 commits

  • (kvalo: cherry picked from commit 1340cc631bd00431e2f174525c971f119df9efa1 in
    wireless-drivers-next to wireless-drivers as this a frequently reported
    regression)

    Bad latency is found on QCA988x, the issue was introduced by
    commit 4504f0e5b571 ("ath10k: sdio: workaround firmware UART
    pin configuration bug"). If uart_pin_workaround is false, this
    change will set uart pin even if uart_print is false.

    Tested HW: QCA9880
    Tested FW: 10.2.4-1.0-00037

    Fixes: 4504f0e5b571 ("ath10k: sdio: workaround firmware UART pin configuration bug")
    Signed-off-by: Miaoqing Pan
    Signed-off-by: Kalle Valo

    Miaoqing Pan
     
  • In nsim_fib_init(), if register_fib_notifier failed, nsim_fib_net_ops
    should be unregistered before return.

    In nsim_fib_exit(), unregister_fib_notifier should be called before
    nsim_fib_net_ops be unregistered, otherwise may cause use-after-free:

    BUG: KASAN: use-after-free in nsim_fib_event_nb+0x342/0x570 [netdevsim]
    Read of size 8 at addr ffff8881daaf4388 by task kworker/0:3/3499

    CPU: 0 PID: 3499 Comm: kworker/0:3 Not tainted 5.3.0-rc7+ #30
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
    Workqueue: ipv6_addrconf addrconf_dad_work [ipv6]
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0xa9/0x10e lib/dump_stack.c:113
    print_address_description+0x65/0x380 mm/kasan/report.c:351
    __kasan_report+0x149/0x18d mm/kasan/report.c:482
    kasan_report+0xe/0x20 mm/kasan/common.c:618
    nsim_fib_event_nb+0x342/0x570 [netdevsim]
    notifier_call_chain+0x52/0xf0 kernel/notifier.c:95
    __atomic_notifier_call_chain+0x78/0x140 kernel/notifier.c:185
    call_fib_notifiers+0x30/0x60 net/core/fib_notifier.c:30
    call_fib6_entry_notifiers+0xc1/0x100 [ipv6]
    fib6_add+0x92e/0x1b10 [ipv6]
    __ip6_ins_rt+0x40/0x60 [ipv6]
    ip6_ins_rt+0x84/0xb0 [ipv6]
    __ipv6_ifa_notify+0x4b6/0x550 [ipv6]
    ipv6_ifa_notify+0xa5/0x180 [ipv6]
    addrconf_dad_completed+0xca/0x640 [ipv6]
    addrconf_dad_work+0x296/0x960 [ipv6]
    process_one_work+0x5c0/0xc00 kernel/workqueue.c:2269
    worker_thread+0x5c/0x670 kernel/workqueue.c:2415
    kthread+0x1d7/0x200 kernel/kthread.c:255
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

    Allocated by task 3388:
    save_stack+0x19/0x80 mm/kasan/common.c:69
    set_track mm/kasan/common.c:77 [inline]
    __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:493
    kmalloc include/linux/slab.h:557 [inline]
    kzalloc include/linux/slab.h:748 [inline]
    ops_init+0xa9/0x220 net/core/net_namespace.c:127
    __register_pernet_operations net/core/net_namespace.c:1135 [inline]
    register_pernet_operations+0x1d4/0x420 net/core/net_namespace.c:1212
    register_pernet_subsys+0x24/0x40 net/core/net_namespace.c:1253
    nsim_fib_init+0x12/0x70 [netdevsim]
    veth_get_link_ksettings+0x2b/0x50 [veth]
    do_one_initcall+0xd4/0x454 init/main.c:939
    do_init_module+0xe0/0x330 kernel/module.c:3490
    load_module+0x3c2f/0x4620 kernel/module.c:3841
    __do_sys_finit_module+0x163/0x190 kernel/module.c:3931
    do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 3534:
    save_stack+0x19/0x80 mm/kasan/common.c:69
    set_track mm/kasan/common.c:77 [inline]
    __kasan_slab_free+0x130/0x180 mm/kasan/common.c:455
    slab_free_hook mm/slub.c:1423 [inline]
    slab_free_freelist_hook mm/slub.c:1474 [inline]
    slab_free mm/slub.c:3016 [inline]
    kfree+0xe9/0x2d0 mm/slub.c:3957
    ops_free net/core/net_namespace.c:151 [inline]
    ops_free_list.part.7+0x156/0x220 net/core/net_namespace.c:184
    ops_free_list net/core/net_namespace.c:182 [inline]
    __unregister_pernet_operations net/core/net_namespace.c:1165 [inline]
    unregister_pernet_operations+0x221/0x2a0 net/core/net_namespace.c:1224
    unregister_pernet_subsys+0x1d/0x30 net/core/net_namespace.c:1271
    nsim_fib_exit+0x11/0x20 [netdevsim]
    nsim_module_exit+0x16/0x21 [netdevsim]
    __do_sys_delete_module kernel/module.c:1015 [inline]
    __se_sys_delete_module kernel/module.c:958 [inline]
    __x64_sys_delete_module+0x244/0x330 kernel/module.c:958
    do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Reported-by: Hulk Robot
    Fixes: 59c84b9fcf42 ("netdevsim: Restore per-network namespace accounting for fib entries")
    Signed-off-by: YueHaibing
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    YueHaibing
     
  • pSeries machines on POWER9 processors can run with the XICS (legacy)
    interrupt mode or with the XIVE exploitation interrupt mode. These
    interrupt contollers have different interfaces for interrupt
    management : XICS uses hcalls and XIVE loads and stores on a page.
    H_EOI being a XICS interface the enable_scrq_irq() routine can fail
    when the machine runs in XIVE mode.

    Fix that by calling the EOI handler of the interrupt chip.

    Fixes: f23e0643cd0b ("ibmvnic: Clear pending interrupt after device reset")
    Signed-off-by: Cédric Le Goater
    Signed-off-by: David S. Miller

    Cédric Le Goater
     
  • __lpc_eth_shutdown is called after __lpc_eth_reset but it is already
    calling __lpc_eth_reset. Avoid resetting the IP twice.

    Signed-off-by: Alexandre Belloni
    Signed-off-by: David S. Miller

    Alexandre Belloni
     
  • Eric Dumazet says:

    ====================
    tcp: address KCSAN reports in tcp_poll() (part I)

    This all started with a KCSAN report (included
    in "tcp: annotate tp->rcv_nxt lockless reads" changelog)

    tcp_poll() runs in a lockless way. This means that about
    all accesses of tcp socket fields done in tcp_poll() context
    need annotations otherwise KCSAN will complain about data-races.

    While doing this detective work, I found a more serious bug,
    addressed by the first patch ("tcp: add rcu protection around
    tp->fastopen_rsk").
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_wmem_queued while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    sk_wmem_queued_add() helper is added so that we can in
    the future convert to ADD_ONCE() or equivalent if/when
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_sndbuf while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    Note that other transports probably need similar fixes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_rcvbuf while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    Note that other transports probably need similar fixes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There two places where we fetch tp->urg_seq while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write side use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->snd_nxt while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->write_seq while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->copied_seq while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->rcv_nxt while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)

    syzbot reported :

    BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv

    write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
    tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
    tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
    tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
    tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
    tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
    ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
    ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
    dst_input include/net/dst.h:442 [inline]
    ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
    napi_skb_finish net/core/dev.c:5671 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061

    read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
    tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
    tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
    sock_poll+0xed/0x250 net/socket.c:1256
    vfs_poll include/linux/poll.h:90 [inline]
    ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
    ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
    ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
    ep_send_events fs/eventpoll.c:1793 [inline]
    ep_poll+0xe3/0x900 fs/eventpoll.c:1930
    do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
    __do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
    __se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
    __x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
    do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Both tcp_v4_err() and tcp_v6_err() do the following operations
    while they do not own the socket lock :

    fastopen = tp->fastopen_rsk;
    snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;

    The problem is that without appropriate barrier, the compiler
    might reload tp->fastopen_rsk and trigger a NULL deref.

    request sockets are protected by RCU, we can simply add
    the missing annotations and barriers to solve the issue.

    Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Oct, 2019

1 commit


12 Oct, 2019

1 commit

  • If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
    the UDP socket is being shut down, rxrpc_error_report() may get called to
    deal with it after sk_user_data on the UDP socket has been cleared, leading
    to a NULL pointer access when this local endpoint record gets accessed.

    Fix this by just returning immediately if sk_user_data was NULL.

    The oops looks like the following:

    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    ...
    RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
    ...
    Call Trace:
    ? sock_queue_err_skb+0xbd/0xde
    ? __udp4_lib_err+0x313/0x34d
    __udp4_lib_err+0x313/0x34d
    icmp_unreach+0x1ee/0x207
    icmp_rcv+0x25b/0x28f
    ip_protocol_deliver_rcu+0x95/0x10e
    ip_local_deliver+0xe9/0x148
    __netif_receive_skb_one_core+0x52/0x6e
    process_backlog+0xdc/0x177
    net_rx_action+0xf9/0x270
    __do_softirq+0x1b6/0x39a
    ? smpboot_register_percpu_thread+0xce/0xce
    run_ksoftirqd+0x1d/0x42
    smpboot_thread_fn+0x19e/0x1b3
    kthread+0xf1/0xf6
    ? kthread_delayed_work_timer_fn+0x83/0x83
    ret_from_fork+0x24/0x30

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

11 Oct, 2019

2 commits

  • Karsten Graul says:

    ====================
    Fixes for -net, addressing two races in SMC receive path and
    add a missing cleanup when the link group creating fails with
    ISM devices and a VLAN id.
    ====================

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • smc_rx_recvmsg() first checks if data is available, and then if
    RCV_SHUTDOWN is set. There is a race when smc_cdc_msg_recv_action() runs
    in between these 2 checks, receives data and sets RCV_SHUTDOWN.
    In that case smc_rx_recvmsg() would return from receive without to
    process the available data.
    Fix that with a final check for data available if RCV_SHUTDOWN is set.
    Move the check for data into a function and call it twice.
    And use the existing helper smc_rx_data_available().

    Fixes: 952310ccf2d8 ("smc: receive data from RMBE")
    Reviewed-by: Ursula Braun
    Signed-off-by: Karsten Graul
    Signed-off-by: Jakub Kicinski

    Karsten Graul