08 Mar, 2020

1 commit

  • Merge Linux stable release v5.4.24 into imx_5.4.y

    * tag 'v5.4.24': (3306 commits)
    Linux 5.4.24
    blktrace: Protect q->blk_trace with RCU
    kvm: nVMX: VMWRITE checks unsupported field before read-only field
    ...

    Signed-off-by: Jason Liu

    Conflicts:
    arch/arm/boot/dts/imx6sll-evk.dts
    arch/arm/boot/dts/imx7ulp.dtsi
    arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
    drivers/clk/imx/clk-composite-8m.c
    drivers/gpio/gpio-mxc.c
    drivers/irqchip/Kconfig
    drivers/mmc/host/sdhci-of-esdhc.c
    drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
    drivers/net/can/flexcan.c
    drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
    drivers/net/ethernet/mscc/ocelot.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
    drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
    drivers/net/phy/realtek.c
    drivers/pci/controller/mobiveil/pcie-mobiveil-host.c
    drivers/perf/fsl_imx8_ddr_perf.c
    drivers/tee/optee/shm_pool.c
    drivers/usb/cdns3/gadget.c
    kernel/sched/cpufreq.c
    net/core/xdp.c
    sound/soc/fsl/fsl_esai.c
    sound/soc/fsl/fsl_sai.c
    sound/soc/sof/core.c
    sound/soc/sof/imx/Kconfig
    sound/soc/sof/loader.c

    Jason Liu
     

05 Mar, 2020

1 commit

  • [ Upstream commit 8a9093c79863b58cc2f9874d7ae788f0d622a596 ]

    tc flower rules that are based on src or dst port blocking are sometimes
    ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
    ports from the skb for tc flower to match against. However, the port
    dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
    key_control->flags. All callers of __skb_flow_dissect(), zero-out the
    key_control field except for fl_classify() as used by the flower
    classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
    __skb_flow_dissect(), since key_control is allocated on the stack
    and may not be initialized.

    Since key_basic and key_control are present for all flow keys, let's
    make sure they are initialized.

    Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments")
    Co-developed-by: Eric Dumazet
    Signed-off-by: Eric Dumazet
    Acked-by: Cong Wang
    Signed-off-by: Jason Baron
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jason Baron
     

26 Feb, 2020

1 commit

  • The DSA drivers that implement .phylink_mac_link_state should normally
    register an interrupt for the PCS, from which they should call
    phylink_mac_change(). However not all switches implement this, and those
    who don't should set this flag in dsa_switch in the .setup callback, so
    that PHYLINK will poll for a few ms until the in-band AN link timer
    expires and the PCS state settles.

    Signed-off-by: Vladimir Oltean

    Conflicts:
    include/net/dsa.h

    trivially with upstream commit 05f294a85235 ("net: dsa: allocate ports
    on touch") which was merged in v5.4-rc3.

    (cherry picked from commit 222d888331f409755fc25b1933e5dee1a976b9c1)

    Vladimir Oltean
     

11 Feb, 2020

1 commit

  • [ Upstream commit 38f88c45404293bbc027b956def6c10cbd45c616 ]

    syzbot managed to send an IPX packet through bond_alb_xmit()
    and af_packet and triggered a use-after-free.

    First, bond_alb_xmit() was using ipx_hdr() helper to reach
    the IPX header, but ipx_hdr() was using the transport offset
    instead of the network offset. In the particular syzbot
    report transport offset was 0xFFFF

    This patch removes ipx_hdr() since it was only (mis)used from bonding.

    Then we need to make sure IPv4/IPv6/IPX headers are pulled
    in skb->head before dereferencing anything.

    BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
    Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108
    (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...)

    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    [] __dump_stack lib/dump_stack.c:17 [inline]
    [] dump_stack+0x14d/0x20b lib/dump_stack.c:53
    [] print_address_description+0x6f/0x20b mm/kasan/report.c:282
    [] kasan_report_error mm/kasan/report.c:380 [inline]
    [] kasan_report mm/kasan/report.c:438 [inline]
    [] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422
    [] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469
    [] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
    [] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline]
    [] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224
    [] __netdev_start_xmit include/linux/netdevice.h:4525 [inline]
    [] netdev_start_xmit include/linux/netdevice.h:4539 [inline]
    [] xmit_one net/core/dev.c:3611 [inline]
    [] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627
    [] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238
    [] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278
    [] packet_snd net/packet/af_packet.c:3226 [inline]
    [] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252
    [] sock_sendmsg_nosec net/socket.c:673 [inline]
    [] sock_sendmsg+0x12c/0x160 net/socket.c:684
    [] __sys_sendto+0x262/0x380 net/socket.c:1996
    [] SYSC_sendto net/socket.c:2008 [inline]
    [] SyS_sendto+0x40/0x60 net/socket.c:2004

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Jay Vosburgh
    Cc: Veaceslav Falico
    Cc: Andy Gospodarek
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

06 Feb, 2020

1 commit

  • [ Upstream commit 26ec17a1dc5ecdd8d91aba63ead6f8b5ad5dea0d ]

    In case a radar event of CAC_FINISHED or RADAR_DETECTED
    happens during another phy is during CAC we might need
    to cancel that CAC.

    If we got a radar in a channel that another phy is now
    doing CAC on then the CAC should be canceled there.

    If, for example, 2 phys doing CAC on the same channels,
    or on comptable channels, once on of them will finish his
    CAC the other might need to cancel his CAC, since it is no
    longer relevant.

    To fix that the commit adds an callback and implement it in
    mac80211 to end CAC.
    This commit also adds a call to said callback if after a radar
    event we see the CAC is no longer relevant

    Signed-off-by: Orr Mazor
    Reviewed-by: Sergey Matyukevich
    Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com
    [slightly reformat/reword commit message]
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Orr Mazor
     

01 Feb, 2020

2 commits

  • [ Upstream commit 6cd021a58c18a1731f7e47f83e172c0c302d65e5 ]

    Multicast and broadcast packets can be looped from egress to ingress
    pre segmentation with dev_loopback_xmit. That function unconditionally
    sets ip_summed to CHECKSUM_UNNECESSARY.

    udp_rcv_segment segments gso packets in the udp rx path. Segmentation
    usually executes on egress, and does not expect packets of this type.
    __udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But
    the offsets are not correct for gso_make_checksum.

    UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set
    to the correct pseudo header checksum. Reset ip_summed to this type.
    (CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h)

    Reported-by: syzbot
    Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 2e24cd755552350b94a7617617c6877b8cbcb701 ]

    The current implementations of ops->bind_class() are merely
    searching for classid and updating class in the struct tcf_result,
    without invoking either of cl_ops->bind_tcf() or
    cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
    like cbq use them to count filters too. This is why syzbot triggered
    the warning in cbq_destroy_class().

    In order to fix this, we have to call cl_ops->bind_tcf() and
    cl_ops->unbind_tcf() like the filter binding path. This patch does
    so by refactoring out two helper functions __tcf_bind_filter()
    and __tcf_unbind_filter(), which are lockless and accept a Qdisc
    pointer, then teaching each implementation to call them correctly.

    Note, we merely pass the Qdisc pointer as an opaque pointer to
    each filter, they only need to pass it down to the helper
    functions without understanding it at all.

    Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
    Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
    Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     

29 Jan, 2020

1 commit

  • commit eb014de4fd418de1a277913cba244e47274fe392 upstream.

    This patch introduces a list of pending module requests. This new module
    list is composed of nft_module_request objects that contain the module
    name and one status field that tells if the module has been already
    loaded (the 'done' field).

    In the first pass, from the preparation phase, the netlink command finds
    that a module is missing on this list. Then, a module request is
    allocated and added to this list and nft_request_module() returns
    -EAGAIN. This triggers the abort path with the autoload parameter set on
    from nfnetlink, request_module() is called and the module request enters
    the 'done' state. Since the mutex is released when loading modules from
    the abort phase, the module list is zapped so this is iteration occurs
    over a local list. Therefore, the request_module() calls happen when
    object lists are in consistent state (after fulling aborting the
    transaction) and the commit list is empty.

    On the second pass, the netlink command will find that it already tried
    to load the module, so it does not request it again and
    nft_request_module() returns 0. Then, there is a look up to find the
    object that the command was missing. If the module was successfully
    loaded, the command proceeds normally since it finds the missing object
    in place, otherwise -ENOENT is reported to userspace.

    This patch also updates nfnetlink to include the reason to enter the
    abort phase, which is required for this new autoload module rationale.

    Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module")
    Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Pablo Neira Ayuso
     

23 Jan, 2020

1 commit

  • commit 33bfe20dd7117dd81fd896a53f743a233e1ad64f upstream.

    When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
    and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
    don't push the write_space callback up and instead simply overwrite the
    op with the psock stored previous op. This may or may not be correct so
    to ensure we don't overwrite the TLS write space hook pass this field to
    the ULP and have it fixup the ctx.

    This completes a previous fix that pushed the ops through to the ULP
    but at the time missed doing this for write_space, presumably because
    write_space TLS hook was added around the same time.

    Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free")
    Signed-off-by: John Fastabend
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Sitnicki
    Acked-by: Jonathan Lemon
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    John Fastabend
     

10 Jan, 2020

1 commit

  • The page pool keeps track of the number of pages in flight, and
    it isn't safe to remove the pool until all pages are returned.

    Disallow removing the pool until all pages are back, so the pool
    is always available for page producers.

    Make the page pool responsible for its own delayed destruction
    instead of relying on XDP, so the page pool can be used without
    the xdp memory model.

    When all pages are returned, free the pool and notify xdp if the
    pool is registered with the xdp memory system. Have the callback
    perform a table walk since some drivers (cpsw) may share the pool
    among multiple xdp_rxq_info.

    Note that the increment of pages_state_release_cnt may result in
    inflight == 0, resulting in the pool being released.

    Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
    Signed-off-by: Jonathan Lemon
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Ilias Apalodimas
    Signed-off-by: David S. Miller

    Jonathan Lemon
     

09 Jan, 2020

3 commits

  • [ Upstream commit 7c68fa2bddda6d942bd387c9ba5b4300737fd991 ]

    sk->sk_pacing_shift can be read and written without lock
    synchronization. This patch adds annotations to
    document this fact and avoid future syzbot complains.

    This might also avoid unexpected false sharing
    in sk_pacing_shift_update(), as the compiler
    could remove the conditional check and always
    write over sk->sk_pacing_shift :

    if (sk->sk_pacing_shift != val)
    sk->sk_pacing_shift = val;

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     
  • [ Upstream commit c305c6ae79e2ce20c22660ceda94f0d86d639a82 ]

    KCSAN reported a data-race [1]

    While we can use READ_ONCE() on the read sides,
    we need to make sure hh->hh_len is written last.

    [1]

    BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output

    write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0:
    eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247
    neigh_hh_init net/core/neighbour.c:1463 [inline]
    neigh_resolve_output net/core/neighbour.c:1480 [inline]
    neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470
    neigh_output include/net/neighbour.h:511 [inline]
    ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
    __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
    __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
    ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
    NF_HOOK_COND include/linux/netfilter.h:294 [inline]
    ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
    dst_output include/net/dst.h:436 [inline]
    NF_HOOK include/linux/netfilter.h:305 [inline]
    ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
    ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
    rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1:
    neigh_resolve_output net/core/neighbour.c:1479 [inline]
    neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470
    neigh_output include/net/neighbour.h:511 [inline]
    ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
    __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
    __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
    ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
    NF_HOOK_COND include/linux/netfilter.h:294 [inline]
    ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
    dst_output include/net/dst.h:436 [inline]
    NF_HOOK include/linux/netfilter.h:305 [inline]
    ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
    ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
    rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: events rt6_probe_deferred

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     
  • commit 90b2be27bb0e56483f335cc10fb59ec66882b949 upstream.

    KCSAN reported the following race [1]

    BUG: KCSAN: data-race in __dev_queue_xmit / net_tx_action

    read to 0xffff8880ba403508 of 1 bytes by task 21814 on cpu 1:
    __dev_xmit_skb net/core/dev.c:3389 [inline]
    __dev_queue_xmit+0x9db/0x1b40 net/core/dev.c:3761
    dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
    neigh_hh_output include/net/neighbour.h:500 [inline]
    neigh_output include/net/neighbour.h:509 [inline]
    ip6_finish_output2+0x873/0xec0 net/ipv6/ip6_output.c:116
    __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
    __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
    ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
    NF_HOOK_COND include/linux/netfilter.h:294 [inline]
    ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
    dst_output include/net/dst.h:436 [inline]
    ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
    ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
    udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
    udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
    inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg+0x9f/0xc0 net/socket.c:657
    ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
    __sys_sendmmsg+0x123/0x350 net/socket.c:2413
    __do_sys_sendmmsg net/socket.c:2442 [inline]
    __se_sys_sendmmsg net/socket.c:2439 [inline]
    __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    write to 0xffff8880ba403508 of 1 bytes by interrupt on cpu 0:
    qdisc_run_begin include/net/sch_generic.h:160 [inline]
    qdisc_run include/net/pkt_sched.h:120 [inline]
    net_tx_action+0x2b1/0x6c0 net/core/dev.c:4551
    __do_softirq+0x115/0x33f kernel/softirq.c:292
    do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
    do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
    do_softirq kernel/softirq.c:329 [inline]
    __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
    local_bh_enable include/linux/bottom_half.h:32 [inline]
    rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
    ip6_finish_output2+0x7bb/0xec0 net/ipv6/ip6_output.c:117
    __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
    __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
    ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
    NF_HOOK_COND include/linux/netfilter.h:294 [inline]
    ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
    dst_output include/net/dst.h:436 [inline]
    ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
    ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
    udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
    udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
    inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg+0x9f/0xc0 net/socket.c:657
    ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
    __sys_sendmmsg+0x123/0x350 net/socket.c:2413
    __do_sys_sendmmsg net/socket.c:2442 [inline]
    __se_sys_sendmmsg net/socket.c:2439 [inline]
    __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 21817 Comm: syz-executor.2 Not tainted 5.4.0-rc6+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Fixes: d518d2ed8640 ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Paolo Abeni
    Cc: Davide Caratti
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

05 Jan, 2020

5 commits

  • [ Upstream commit 8dbd76e79a16b45b2ccb01d2f2e08dbf64e71e40 ]

    Michal Kubecek and Firo Yang did a very nice analysis of crashes
    happening in __inet_lookup_established().

    Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN
    (via a close()/socket()/listen() cycle) without a RCU grace period,
    I should not have changed listeners linkage in their hash table.

    They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt),
    so that a lookup can detect a socket in a hash list was moved in
    another one.

    Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve
    merge conflict for v4/v6 ordering fix"), we have to add
    hlist_nulls_add_tail_rcu() helper.

    Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
    Signed-off-by: Eric Dumazet
    Reported-by: Michal Kubecek
    Reported-by: Firo Yang
    Reviewed-by: Michal Kubecek
    Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit f081042d128a0c7acbd67611def62e1b52e2d294 ]

    When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
    we should not call dst_confirm_neigh() as there is no two-way communication.

    So disable the neigh confirm for vxlan and geneve pmtu update.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
    Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
    Reviewed-by: Guillaume Nault
    Tested-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit 07dc35c6e3cc3c001915d05f5bf21f80a39a0970 ]

    Add a new function skb_dst_update_pmtu_no_confirm() for callers who need
    update pmtu but should not do neighbor confirm.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Reviewed-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit bd085ef678b2cc8c38c105673dfe8ff8f5ec0c57 ]

    The MTU update code is supposed to be invoked in response to real
    networking events that update the PMTU. In IPv6 PMTU update function
    __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
    confirmed time.

    But for tunnel code, it will call pmtu before xmit, like:
    - tnl_update_pmtu()
    - skb_dst_update_pmtu()
    - ip6_rt_update_pmtu()
    - __ip6_rt_update_pmtu()
    - dst_confirm_neigh()

    If the tunnel remote dst mac address changed and we still do the neigh
    confirm, we will not be able to update neigh cache and ping6 remote
    will failed.

    So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
    should not be invoking dst_confirm_neigh() as we have no evidence
    of successful two-way communication at this point.

    On the other hand it is also important to keep the neigh reachability fresh
    for TCP flows, so we cannot remove this dst_confirm_neigh() call.

    To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
    to choose whether we should do neigh update or not. I will add the parameter
    in this patch and set all the callers to true to comply with the previous
    way, and fix the tunnel code one by one on later patches.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Suggested-by: David Miller
    Reviewed-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     
  • [ Upstream commit a5b72a083da197b493c7ed1e5730d62d3199f7d6 ]

    Revert "net/sched: cls_u32: fix refcount leak in the error path of
    u32_change()", and fix the u32 refcount leak in a more generic way that
    preserves the semantic of rule dumping.
    On tc filters that don't support lockless insertion/removal, there is no
    need to guard against concurrent insertion when a removal is in progress.
    Therefore, for most of them we can avoid a full walk() when deleting, and
    just decrease the refcount, like it was done on older Linux kernels.
    This fixes situations where walk() was wrongly detecting a non-empty
    filter, like it happened with cls_u32 in the error path of change(), thus
    leading to failures in the following tdc selftests:

    6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
    6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
    74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id

    On cls_flower, and on (future) lockless filters, this check is necessary:
    move all the check_empty() logic in a callback so that each filter
    can have its own implementation. For cls_flower, it's sufficient to check
    if no IDRs have been allocated.

    This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620.

    Changes since v1:
    - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
    is used, thanks to Vlad Buslov
    - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
    - squash revert and new fix in a single patch, to be nice with bisect
    tests that run tdc on u32 filter, thanks to Dave Miller

    Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
    Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
    Suggested-by: Jamal Hadi Salim
    Suggested-by: Vlad Buslov
    Signed-off-by: Davide Caratti
    Reviewed-by: Vlad Buslov
    Tested-by: Jamal Hadi Salim
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Davide Caratti
     

31 Dec, 2019

3 commits

  • [ Upstream commit 25c7a6d1f90e208ec27ca854b1381ed39842ec57 ]

    There are common instances of the following construct :

    if (n->confirmed != now)
    n->confirmed = now;

    A C compiler could legally remove the conditional.

    Use READ_ONCE()/WRITE_ONCE() to avoid this problem.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin

    Eric Dumazet
     
  • [ Upstream commit f394722fb0d0f701119368959d7cd0ecbc46363a ]

    neigh_cleanup() has not been used for seven years, and was a wrong design.

    Messing with shared pointer in bond_neigh_init() without proper
    memory barriers would at least trigger syzbot complains eventually.

    It is time to remove this stuff.

    Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 258a980d1ec23e2c786e9536a7dd260bea74bae6 ]

    When storing a pointer to a dst_metrics structure in dst_entry._metrics,
    two flags are added in the least significant bits of the pointer value.
    Hence this assumes all pointers to dst_metrics structures have at least
    4-byte alignment.

    However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not
    4 bytes. Hence in some kernel builds, dst_default_metrics may be only
    2-byte aligned, leading to obscure boot warnings like:

    WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a
    refcount_t: underflow; use-after-free.
    Modules linked in:
    CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W 5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty #261
    Stack from 10835e6c:
    10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea
    00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000
    04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c
    00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001
    003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84
    003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a
    Call Trace: [] __warn+0xb2/0xb4
    [] warn_slowpath_fmt+0x42/0x64
    [] refcount_warn_saturate+0x44/0x9a
    [] printk+0x0/0x18
    [] refcount_warn_saturate+0x44/0x9a
    [] refcount_sub_and_test.constprop.73+0x38/0x3e
    [] ipv4_dst_destroy+0x5e/0x7e
    [] __local_bh_enable_ip+0x0/0x8e
    [] dst_destroy+0x40/0xae

    Fix this by forcing 4-byte alignment of all dst_metrics structures.

    Fixes: e5fd387ad5b30ca3 ("ipv6: do not overwrite inetpeer metrics prematurely")
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     

18 Dec, 2019

9 commits

  • [ Upstream commit c3f812cea0d7006469d1cf33a4a9f0a12bb4b3a3 ]

    The page pool keeps track of the number of pages in flight, and
    it isn't safe to remove the pool until all pages are returned.

    Disallow removing the pool until all pages are back, so the pool
    is always available for page producers.

    Make the page pool responsible for its own delayed destruction
    instead of relying on XDP, so the page pool can be used without
    the xdp memory model.

    When all pages are returned, free the pool and notify xdp if the
    pool is registered with the xdp memory system. Have the callback
    perform a table walk since some drivers (cpsw) may share the pool
    among multiple xdp_rxq_info.

    Note that the increment of pages_state_release_cnt may result in
    inflight == 0, resulting in the pool being released.

    Fixes: d956a048cd3f ("xdp: force mem allocator removal and periodic warning")
    Signed-off-by: Jonathan Lemon
    Acked-by: Jesper Dangaard Brouer
    Acked-by: Ilias Apalodimas
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jonathan Lemon
     
  • [ Upstream commit 8ffb055beae58574d3e77b4bf9d4d15eace1ca27 ]

    The recent commit 5c72299fba9d ("net: sched: cls_flower: Classify
    packets using port ranges") had added filtering based on port ranges
    to tc flower. However the commit missed necessary changes in hw-offload
    code, so the feature gave rise to generating incorrect offloaded flow
    keys in NIC.

    One more detailed example is below:

    $ tc qdisc add dev eth0 ingress
    $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
    dst_port 100-200 action drop

    With the setup above, an exact match filter with dst_port == 0 will be
    installed in NIC by hw-offload. IOW, the NIC will have a rule which is
    equivalent to the following one.

    $ tc qdisc add dev eth0 ingress
    $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \
    dst_port 0 action drop

    The behavior was caused by the flow dissector which extracts packet
    data into the flow key in the tc flower. More specifically, regardless
    of exact match or specified port ranges, fl_init_dissector() set the
    FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port
    numbers from skb in skb_flow_dissect() called by fl_classify(). Note
    that device drivers received the same struct flow_dissector object as
    used in skb_flow_dissect(). Thus, offloaded drivers could not identify
    which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was
    set to struct flow_dissector in either case.

    This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new
    tp_range field in struct fl_flow_key to recognize which filters are applied
    to offloaded drivers. At this point, when filters based on port ranges
    passed to drivers, drivers return the EOPNOTSUPP error because they do
    not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE
    flag).

    Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
    Signed-off-by: Yoshiki Komachi
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Yoshiki Komachi
     
  • [ Upstream commit dbad3408896c3c5722ec9cda065468b3df16c5bf ]

    With indirect blocks, a driver can register for callbacks from a device
    that is does not 'own', for example, a tunnel device. When registering to
    or unregistering from a new device, a callback is triggered to generate
    a bind/unbind event. This, in turn, allows the driver to receive any
    existing rules or to properly clean up installed rules.

    When first added, it was assumed that all indirect block registrations
    would be for ingress offloads. However, the NFP driver can, in some
    instances, support clsact qdisc binds for egress offload.

    Change the name of the indirect block callback command in flow_offload to
    remove the 'ingress' identifier from it. While this does not change
    functionality, a follow up patch will implement a more more generic
    callback than just those currently just supporting ingress offload.

    Fixes: 4d12ba42787b ("nfp: flower: allow offloading of matches on 'internal' ports")
    Signed-off-by: John Hurley
    Acked-by: Jakub Kicinski
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    John Hurley
     
  • [ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ]

    Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
    timestamp of the last synflood. Protect them with READ_ONCE() and
    WRITE_ONCE() since reads and writes aren't serialised.

    Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
    introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from
    struct tcp_sock"). But unprotected accesses were already there when
    timestamp was stored in .last_synq_overflow.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Guillaume Nault
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • [ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ]

    When no synflood occurs, the synflood timestamp isn't updated.
    Therefore it can be so old that time_after32() can consider it to be
    in the future.

    That's a problem for tcp_synq_no_recent_overflow() as it may report
    that a recent overflow occurred while, in fact, it's just that jiffies
    has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.

    Spurious detection of recent overflows lead to extra syncookie
    verification in cookie_v[46]_check(). At that point, the verification
    should fail and the packet dropped. But we should have dropped the
    packet earlier as we didn't even send a syncookie.

    Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
    only if jiffies is within the
    [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
    way, no spurious recent overflow is reported when jiffies wraps and
    'last_overflow' becomes in the future from the point of view of
    time_after32().

    However, if jiffies wraps and enters the
    [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
    'last_overflow' being a stale synflood timestamp), then
    tcp_synq_no_recent_overflow() still erroneously reports an
    overflow. In such cases, we have to rely on syncookie verification
    to drop the packet. We unfortunately have no way to differentiate
    between a fresh and a stale syncookie timestamp.

    In practice, using last_overflow as lower bound is problematic.
    If the synflood timestamp is concurrently updated between the time
    we read jiffies and the moment we store the timestamp in
    'last_overflow', then 'now' becomes smaller than 'last_overflow' and
    tcp_synq_no_recent_overflow() returns true, potentially dropping a
    valid syncookie.

    Reading jiffies after loading the timestamp could fix the problem,
    but that'd require a memory barrier. Let's just accommodate for
    potential timestamp growth instead and extend the interval using
    'last_overflow - HZ' as lower bound.

    Signed-off-by: Guillaume Nault
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • [ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ]

    If no synflood happens for a long enough period of time, then the
    synflood timestamp isn't refreshed and jiffies can advance so much
    that time_after32() can't accurately compare them any more.

    Therefore, we can end up in a situation where time_after32(now,
    last_overflow + HZ) returns false, just because these two values are
    too far apart. In that case, the synflood timestamp isn't updated as
    it should be, which can trick tcp_synq_no_recent_overflow() into
    rejecting valid syncookies.

    For example, let's consider the following scenario on a system
    with HZ=1000:

    * The synflood timestamp is 0, either because that's the timestamp
    of the last synflood or, more commonly, because we're working with
    a freshly created socket.

    * We receive a new SYN, which triggers synflood protection. Let's say
    that this happens when jiffies == 2147484649 (that is,
    'synflood timestamp' + HZ + 2^31 + 1).

    * Then tcp_synq_overflow() doesn't update the synflood timestamp,
    because time_after32(2147484649, 1000) returns false.
    With:
    - 2147484649: the value of jiffies, aka. 'now'.
    - 1000: the value of 'last_overflow' + HZ.

    * A bit later, we receive the ACK completing the 3WHS. But
    cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
    says that we're not under synflood. That's because
    time_after32(2147484649, 120000) returns false.
    With:
    - 2147484649: the value of jiffies, aka. 'now'.
    - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.

    Of course, in reality jiffies would have increased a bit, but this
    condition will last for the next 119 seconds, which is far enough
    to accommodate for jiffie's growth.

    Fix this by updating the overflow timestamp whenever jiffies isn't
    within the [last_overflow, last_overflow + HZ] range. That shouldn't
    have any performance impact since the update still happens at most once
    per second.

    Now we're guaranteed to have fresh timestamps while under synflood, so
    tcp_synq_no_recent_overflow() can safely use it with time_after32() in
    such situations.

    Stale timestamps can still make tcp_synq_no_recent_overflow() return
    the wrong verdict when not under synflood. This will be handled in the
    next patch.

    For 64 bits architectures, the problem was introduced with the
    conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
    cca9bab1b72c ("tcp: use monotonic timestamps for PAWS").
    The problem has always been there on 32 bits architectures.

    Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Guillaume Nault
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Guillaume Nault
     
  • [ Upstream commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 ]

    ipv6_stub uses the ip6_dst_lookup function to allow other modules to
    perform IPv6 lookups. However, this function skips the XFRM layer
    entirely.

    All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
    ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
    which calls xfrm_lookup_route(). This patch fixes this inconsistent
    behavior by switching the stub to ip6_dst_lookup_flow, which also calls
    xfrm_lookup_route().

    This requires some changes in all the callers, as these two functions
    take different arguments and have different return types.

    Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
    Reported-by: Xiumei Mu
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e ]

    This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
    as some modules currently pass a net argument without a socket to
    ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change
    ipv6_stub_impl.ipv6_dst_lookup to take net argument").

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ]

    syzbot was once again able to crash a host by setting a very small mtu
    on loopback device.

    Let's make inetdev_valid_mtu() available in include/net/ip.h,
    and use it in ip_setup_cork(), so that we protect both ip_append_page()
    and __ip_append_data()

    Also add a READ_ONCE() when the device mtu is read.

    Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
    even if other code paths might write over this field.

    Add a big comment in include/linux/netdevice.h about dev->mtu
    needing READ_ONCE()/WRITE_ONCE() annotations.

    Hopefully we will add the missing ones in followup patches.

    [1]

    refcount_t: saturated; leaking memory.
    WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
    Kernel panic - not syncing: panic_on_warn set ...
    CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x197/0x210 lib/dump_stack.c:118
    panic+0x2e3/0x75c kernel/panic.c:221
    __warn.cold+0x2f/0x3e kernel/panic.c:582
    report_bug+0x289/0x300 lib/bug.c:195
    fixup_bug arch/x86/kernel/traps.c:174 [inline]
    fixup_bug arch/x86/kernel/traps.c:169 [inline]
    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
    do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
    invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
    RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
    Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
    RSP: 0018:ffff88809689f550 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
    RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
    R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
    R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
    refcount_add include/linux/refcount.h:193 [inline]
    skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
    sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
    ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
    udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
    inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
    kernel_sendpage+0x92/0xf0 net/socket.c:3794
    sock_sendpage+0x8b/0xc0 net/socket.c:936
    pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
    splice_from_pipe_feed fs/splice.c:512 [inline]
    __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
    splice_from_pipe+0x108/0x170 fs/splice.c:671
    generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
    do_splice_from fs/splice.c:861 [inline]
    direct_splice_actor+0x123/0x190 fs/splice.c:1035
    splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
    do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
    do_sendfile+0x597/0xd00 fs/read_write.c:1464
    __do_sys_sendfile64 fs/read_write.c:1525 [inline]
    __se_sys_sendfile64 fs/read_write.c:1511 [inline]
    __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x441409
    Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
    RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
    RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
    R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
    R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
    Kernel Offset: disabled
    Rebooting in 86400 seconds..

    Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

16 Dec, 2019

1 commit

  • This is the 5.4.3 stable release

    Conflicts:
    drivers/cpufreq/imx-cpufreq-dt.c
    drivers/spi/spi-fsl-qspi.c

    The conflict is very minor, fixed it when do the merge. The imx-cpufreq-dt.c
    is just one line code-style change, using upstream one, no any function change.

    The spi-fsl-qspi.c has minor conflicts when merge upstream fixes: c69b17da53b2
    spi: spi-fsl-qspi: Clear TDH bits in FLSHCR register

    After merge, basic boot sanity test and basic qspi test been done on i.mx

    Signed-off-by: Jason Liu

    Jason Liu
     

05 Dec, 2019

3 commits

  • [ Upstream commit c5daa6cccdc2f94aca2c9b3fa5f94e4469997293 ]

    Partially sent record cleanup path increments an SG entry
    directly instead of using sg_next(). This should not be a
    problem today, as encrypted messages should be always
    allocated as arrays. But given this is a cleanup path it's
    easy to miss was this ever to change. Use sg_next(), and
    simplify the code.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • [ Upstream commit 9e5ffed37df68d0ccfb2fdc528609e23a1e70ebe ]

    Looks like when BPF support was added by commit d3b18ad31f93
    ("tls: add bpf support to sk_msg handling") and
    commit d829e9c4112b ("tls: convert to generic sk_msg interface")
    it broke/removed the support for in-place crypto as added by
    commit 4e6d47206c32 ("tls: Add support for inplace records
    encryption").

    The inplace_crypto member of struct tls_rec is dead, inited
    to zero, and sometimes set to zero again. It used to be
    set to 1 when record was allocated, but the skmsg code doesn't
    seem to have been written with the idea of in-place crypto
    in mind.

    Since non trivial effort is required to bring the feature back
    and we don't really have the HW to measure the benefit just
    remove the left over support for now to avoid confusing readers.

    Signed-off-by: Jakub Kicinski
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jakub Kicinski
     
  • [ Upstream commit 312434617cb16be5166316cf9d08ba760b1042a1 ]

    This patch is to fix a data-race reported by syzbot:

    BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj

    write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1:
    sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091
    sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465
    sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916
    inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734
    __sys_accept4+0x224/0x430 net/socket.c:1754
    __do_sys_accept net/socket.c:1795 [inline]
    __se_sys_accept net/socket.c:1792 [inline]
    __x64_sys_accept+0x4e/0x60 net/socket.c:1792
    do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0:
    sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894
    rht_key_get_hash include/linux/rhashtable.h:133 [inline]
    rht_key_hashfn include/linux/rhashtable.h:159 [inline]
    rht_head_hashfn include/linux/rhashtable.h:174 [inline]
    head_hashfn lib/rhashtable.c:41 [inline]
    rhashtable_rehash_one lib/rhashtable.c:245 [inline]
    rhashtable_rehash_chain lib/rhashtable.c:276 [inline]
    rhashtable_rehash_table lib/rhashtable.c:316 [inline]
    rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420
    process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
    worker_thread+0xa0/0x800 kernel/workqueue.c:2415
    kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

    It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate
    is changing its value. However, what rhashtable wants is netns from asoc
    base.sk, and for an asoc, its netns won't change once set. So we can
    simply fix it by caching netns since created.

    Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable")
    Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     

02 Dec, 2019

4 commits

  • * wifi/next: (51 commits)
    MLK-22949 brcmfmac: add chip id check for clm_blob firmware load
    MLK-22948 brcmfmac: avoid to send mailbox interrupt twice for core version 0xb
    MLK-22946 brcmfmac: freeing wiphy after brcmf attach failed
    dt-bindings: add new property to enable board_type
    brcmfmac: let board_type is optional
    ...

    Dong Aisheng
     
  • This patch provids netlink method to configure the TSN protocols hardwares.
    TSN guaranteed packet transport with bounded low latency, low packet delay
    variation, and low packet loss by hardware and software methods.

    The three basic components of TSN are:

    1. Time synchronization: This was implement by 8021AS which base on the
    IEEE1588 precision Time Protocol. This is configured by the other way
    in kernel.
    8021AS not included in this patch.

    2. Scheduling and traffic shaping and per-stream filter policing:
    This patch support Qbv/Qci/Qbu/8021CB/Qav etc.

    3. Selection of communication paths:
    This patch not support the pure software only TSN protocols(like Qcc)
    but hardware related configuration.

    TSN Protocols supports by this patch: Qbv/Qci/Qbu/Credit-base Shaper(Qav).
    This patch verified on NXP ls1028ardb board.

    Signed-off-by: Po Liu

    Po Liu
     
  • Support tsn capabilities in DSA felix switch driver. This felix tsn
    driver is using tsn configuration of ocelot, and registered on each
    switch port through DSA port setup.

    Signed-off-by: Xiaoliang Yang

    Xiaoliang Yang
     
  • While it is entirely possible that this tagger format is in fact more
    generic than just these 2 switch families, I don't have that knowledge.
    The Seville switch in NXP T1040 has a similar frame format, but there
    are enough differences (e.g. DEST field starts at bit 57 instead of 56)
    that calling this file tag_vitesse.c is a bit of a stretch at the
    moment. The frame format has been listed in a comment so that people who
    add support for further Vitesse switches can rework this tagger while
    keeping compatibility with Felix.

    The "ocelot" name was chosen instead of "felix" because even the Ocelot
    switch can act as a DSA device when it is used in NPI mode, and the Felix
    tagger format is almost identical. Currently it is only used for the
    Felix switch embedded in the NXP LS1028A chip.

    The ABI for this tagger should be considered "not stable" at the moment.
    The DSA tag is always placed before the Ethernet header and therefore,
    we are using the long prefix for RX tags to avoid putting the DSA master
    port in promiscuous mode. Once there will be an API in DSA for drivers
    to request DSA masters to be in promiscuous mode unconditionally, we
    will switch to the "no prefix" extraction frame header, which will save
    16 padding bytes for each RX frame.

    Signed-off-by: Vladimir Oltean
    Reviewed-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Vladimir Oltean
     

25 Nov, 2019

1 commit

  • Pulling the following commits and some general changes from custom
    v3.10 kernel for supporting qcacld2.0 on kernel v4.9.11.
    1. cfg80211: Using new wiphy flag WIPHY_FLAG_DFS_OFFLOAD
    When flag WIPHY_FLAG_DFS_OFFLOAD is defined, the driver would handle
    all the DFS related operations. Therefore the kernel needs to ignore
    the DFS state that it uses to block the userspace calls to the driver
    through cfg80211 APIs. Also it should treat the userspace calls to
    start radar detection as a no-op.

    Please note that changes in util.c is not picked up explicitly.
    Kernel v4.9.11 uses wrapper cfg80211_get_chans_dfs_required which takes
    care of this change.

    Change-Id: I9dd2076945581ca67e54dfc96dd3dbc526c6f0a2
    IRs-Fixed: 202686

    2. New db.txt from git/sforshee/wireless-regdb.git
    CONFIG_CFG80211_INTERNAL_REGDB is enabled in build. This causes
    kernel warn messages as db.txt is empty. A new db.txt is added
    from:
    git://git.kernel.org/pub/scm/linux/kernel/git/sforshee/wireless-regdb.git

    IRs-Fixed: 202686

    3. Picked up the declaration and definition of the function
    cfg80211_is_gratuitous_arp_unsolicited_na

    Change-Id: I1e4083a2327c121073226aa6b75bb6b5b97cec00
    CRs-fixed: 1079453

    Signed-off-by: Nakul Kachhwaha
    Signed-off-by: Fugang Duan
    (Vipul: Fixed merge conflicts)
    (TODO: checkpatch warnings)
    Signed-off-by: Vipul Kumar

    Sherry Sun
     

20 Nov, 2019

1 commit

  • Bring back tls_sw_sendpage_locked. sk_msg redirection into a socket
    with TLS_TX takes the following path:

    tcp_bpf_sendmsg_redir
    tcp_bpf_push_locked
    tcp_bpf_push
    kernel_sendpage_locked
    sock->ops->sendpage_locked

    Also update the flags test in tls_sw_sendpage_locked to allow flag
    MSG_NO_SHARED_FRAGS. bpf_tcp_sendmsg sets this.

    Link: https://lore.kernel.org/netdev/CA+FuTSdaAawmZ2N8nfDDKu3XLpXBbMtcCT0q4FntDD2gn8ASUw@mail.gmail.com/T/#t
    Link: https://github.com/wdebruij/kerneltools/commits/icept.2
    Fixes: 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through ULP")
    Fixes: f3de19af0f5b ("Revert \"net/tls: remove unused function tls_sw_sendpage_locked\"")
    Signed-off-by: Willem de Bruijn
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Willem de Bruijn