30 Dec, 2020

6 commits

  • [ Upstream commit 35a6d396721e28ba161595b0fc9e8896c00399bb ]

    'snprintf' returns the number of characters which would have been written
    if enough space had been available, excluding the terminating null byte.
    Thus, the return value of 'sizeof(buf)' means that the last character
    has been dropped.

    Signed-off-by: Fedor Tokarev
    Fixes: 2f34b8bfae19 ("SUNRPC: add links for all client xprts to debugfs")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin

    Fedor Tokarev
     
  • [ Upstream commit d5aa6b22e2258f05317313ecc02efbb988ed6d38 ]

    According to RFC5666, the correct netid for an IPv6 addressed RDMA
    transport is "rdma6", which we've supported as a mount option since
    Linux-4.7. The problem is when we try to load the module "xprtrdma6",
    that will fail, since there is no modulealias of that name.

    Fixes: 181342c5ebe8 ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin

    Trond Myklebust
     
  • [ Upstream commit e4c72201b6ec3173dfe13fa2e2335a3ad78d4921 ]

    Currently, we wake up the tasks by priority queue ordering, which means
    that we ignore the batching that is supposed to help with QoS issues.

    Fixes: c049f8ea9a0d ("SUNRPC: Remove the bh-safe lock requirement on the rpc_wait_queue->lock")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Sasha Levin

    Trond Myklebust
     
  • [ Upstream commit 1fb17dfc258ff6208f7873cc7b8e40e27515d2d5 ]

    When adding device to white list the device is added to resolving list
    also. It has to be added only when HCI_ENABLE_LL_PRIVACY flag is set.
    HCI_ENABLE_LL_PRIVACY flag has to be tested before adding/deleting devices
    to resolving list. use_ll_privacy macro is used only to check if controller
    supports LL_Privacy.

    https://bugzilla.kernel.org/show_bug.cgi?id=209745

    Fixes: 0eee35bdfa3b ("Bluetooth: Update resolving list when updating whitelist")
    Signed-off-by: Sathish Narasimman
    Signed-off-by: Marcel Holtmann
    Signed-off-by: Sasha Levin

    Sathish Narasimman
     
  • [ Upstream commit 6dfccd13db2ff2b709ef60a50163925d477549aa ]

    AMP_MGR is getting derefernced in hci_phy_link_complete_evt(), when called
    from hci_event_packet() and there is a possibility, that hcon->amp_mgr may
    not be found when accessing after initialization of hcon.

    - net/bluetooth/hci_event.c:4945
    The bug seems to get triggered in this line:

    bredr_hcon = hcon->amp_mgr->l2cap_conn->hcon;

    Fix it by adding a NULL check for the hcon->amp_mgr before checking the ev-status.

    Fixes: d5e911928bd8 ("Bluetooth: AMP: Process Physical Link Complete evt")
    Reported-and-tested-by: syzbot+0bef568258653cff272f@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?extid=0bef568258653cff272f
    Signed-off-by: Anmol Karn
    Signed-off-by: Marcel Holtmann
    Signed-off-by: Sasha Levin

    Anmol Karn
     
  • [ Upstream commit ba5c25236bc3d399df82ebe923490ea8d2d35cf2 ]

    The for-loop iterates with a u8 loop counter and compares this
    with the loop upper limit of request->n_ssids which is an int type.
    There is a potential infinite loop if n_ssids is larger than the
    u8 loop counter, so fix this by making the loop counter an int.

    Addresses-Coverity: ("Infinite loop")
    Fixes: c8cb5b854b40 ("nl80211/cfg80211: support 6 GHz scanning")
    Signed-off-by: Colin Ian King
    Link: https://lore.kernel.org/r/20201029222407.390218-1-colin.king@canonical.com
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin

    Colin Ian King
     

26 Dec, 2020

3 commits

  • commit 2d9463083ce92636a1bdd3e30d1236e3e95d859e upstream.

    syzbot discovered a bug in which an OOB access was being made because
    an unsuitable key_idx value was wrongly considered to be acceptable
    while deleting a key in nl80211_del_key().

    Since we don't know the cipher at the time of deletion, if
    cfg80211_validate_key_settings() were to be called directly in
    nl80211_del_key(), even valid keys would be wrongly determined invalid,
    and deletion wouldn't occur correctly.
    For this reason, a new function - cfg80211_valid_key_idx(), has been
    created, to determine if the key_idx value provided is valid or not.
    cfg80211_valid_key_idx() is directly called in 2 places -
    nl80211_del_key(), and cfg80211_validate_key_settings().

    Reported-by: syzbot+49d4cab497c2142ee170@syzkaller.appspotmail.com
    Tested-by: syzbot+49d4cab497c2142ee170@syzkaller.appspotmail.com
    Suggested-by: Johannes Berg
    Signed-off-by: Anant Thazhemadam
    Link: https://lore.kernel.org/r/20201204215825.129879-1-anant.thazhemadam@gmail.com
    Cc: stable@vger.kernel.org
    [also disallow IGTK key IDs if no IGTK cipher is supported]
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Anant Thazhemadam
     
  • commit f7e0e8b2f1b0a09b527885babda3e912ba820798 upstream.

    `num_reports` is not being properly checked. A malformed event packet with
    a large `num_reports` number makes hci_le_direct_adv_report_evt() read out
    of bounds. Fix it.

    Cc: stable@vger.kernel.org
    Fixes: 2f010b55884e ("Bluetooth: Add support for handling LE Direct Advertising Report events")
    Reported-and-tested-by: syzbot+24ebd650e20bd263ca01@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?extid=24ebd650e20bd263ca01
    Signed-off-by: Peilin Ye
    Signed-off-by: Marcel Holtmann
    Signed-off-by: Greg Kroah-Hartman

    Peilin Ye
     
  • commit c9f64d1fc101c64ea2be1b2e562b4395127befc9 upstream.

    When dumping the name and NTP servers advertised by DHCP, a blank line
    is emitted if either of the lists is empty. This can lead to confusing
    issues such as the blank line getting flagged as warning. This happens
    because the blank line is the result of pr_cont("\n") and that may see
    its level corrupted by some other driver concurrently writing to the
    console.

    Fix this by making sure that the terminating newline is only emitted
    if at least one entry in the lists was printed before.

    Reported-by: Jon Hunter
    Signed-off-by: Thierry Reding
    Link: https://lore.kernel.org/r/20201110073757.1284594-1-thierry.reding@gmail.com
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Thierry Reding
     

11 Dec, 2020

1 commit

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf 2020-12-10

    The following pull-request contains BPF updates for your *net* tree.

    We've added 21 non-merge commits during the last 12 day(s) which contain
    a total of 21 files changed, 163 insertions(+), 88 deletions(-).

    The main changes are:

    1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei.

    2) Fix ring_buffer__poll() return value, from Andrii.

    3) Fix race in lwt_bpf, from Cong.

    4) Fix test_offload, from Toke.

    5) Various xsk fixes.

    Please consider pulling these changes from:

    git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

    Thanks a lot!

    Also thanks to reporters, reviewers and testers of commits in this pull-request:

    Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John
    Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

10 Dec, 2020

7 commits

  • TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL is a u32 attribute (MPLS label is
    20 bits long).

    Fixes the following bug:

    $ tc filter add dev ethX ingress protocol mpls_uc \
    flower mpls lse depth 2 label 256 \
    action drop

    $ tc filter show dev ethX ingress
    filter protocol mpls_uc pref 49152 flower chain 0
    filter protocol mpls_uc pref 49152 flower chain 0 handle 0x1
    eth_type 8847
    mpls
    lse depth 2 label 0
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
    from Subash Abhinov Kasiviswanathan.

    2) Fix netlink dump of dynset timeouts later than 23 days.

    3) Add comment for the indirect serialization of the nft commit mutex
    with rtnl_mutex.

    4) Remove bogus check for confirmed conntrack when matching on the
    conntrack ID, from Brett Mastbergen.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When cwnd is not a multiple of the TSO skb size of N*MSS, we can get
    into persistent scenarios where we have the following sequence:

    (1) ACK for full-sized skb of N*MSS arrives
    -> tcp_write_xmit() transmit full-sized skb with N*MSS
    -> move pacing release time forward
    -> exit tcp_write_xmit() because pacing time is in the future

    (2) TSQ callback or TCP internal pacing timer fires
    -> try to transmit next skb, but TSO deferral finds remainder of
    available cwnd is not big enough to trigger an immediate send
    now, so we defer sending until the next ACK.

    (3) repeat...

    So we can get into a case where we never mark ourselves as
    cwnd-limited for many seconds at a time, even with
    bulk/infinite-backlog senders, because:

    o In case (1) above, every time in tcp_write_xmit() we have enough
    cwnd to send a full-sized skb, we are not fully using the cwnd
    (because cwnd is not a multiple of the TSO skb size). So every time we
    send data, we are not cwnd limited, and so in the cwnd-limited
    tracking code in tcp_cwnd_validate() we mark ourselves as not
    cwnd-limited.

    o In case (2) above, every time in tcp_write_xmit() that we try to
    transmit the "remainder" of the cwnd but defer, we set the local
    variable is_cwnd_limited to true, but we do not send any packets, so
    sent_pkts is zero, so we don't call the cwnd-limited logic to update
    tp->is_cwnd_limited.

    Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler")
    Reported-by: Ingemar Johansson
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: Eric Dumazet
    Link: https://lore.kernel.org/r/20201209035759.1225145-1-ncardwell.kernel@gmail.com
    Signed-off-by: Jakub Kicinski

    Neal Cardwell
     
  • The offending commit introduces a cleanup callback that is invoked
    when the driver module is removed to clean up the tunnel device
    flow block. But it returns on the first iteration of the for loop.
    The remaining indirect flow blocks will never be freed.

    Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
    CC: Pablo Neira Ayuso
    Signed-off-by: Chris Mi
    Reviewed-by: Roi Dayan

    Chris Mi
     
  • For DCTCP, we have to retain the ECT bits set by the congestion control
    algorithm on the socket when reflecting syn TOS in syn-ack, in order to
    make ECN work properly.

    Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
    Reported-by: Alexander Duyck
    Signed-off-by: Wei Wang
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Wei Wang
     
  • Syzbot reported a stack overflow in bitmap_from_arr32() called from
    ethnl_parse_bitset() when bitset from netlink message is longer than
    target bitmap length. While ethnl_compact_sanity_checks() makes sure that
    trailing part is all zeros (i.e. the request does not try to touch bits
    kernel does not recognize), we also need to cap change_bits to nbits so
    that we don't try to write past the prepared bitmaps.

    Fixes: 88db6d1e4f62 ("ethtool: add ethnl_parse_bitset() helper")
    Reported-by: syzbot+9d39fa49d4df294aab93@syzkaller.appspotmail.com
    Signed-off-by: Michal Kubecek
    Link: https://lore.kernel.org/r/3487ee3a98e14cd526f55b6caaa959d2dcbcad9f.1607465316.git.mkubecek@suse.cz
    Signed-off-by: Jakub Kicinski

    Michal Kubecek
     
  • The isotp socket can be widely configured in its behaviour regarding addressing
    types, fill-ups, receive pattern tests and link layer length. Usually all
    these settings need to be fixed before bind() and can not be changed
    afterwards.

    This patch adds a check to enforce the common usage pattern.

    Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
    Signed-off-by: Oliver Hartkopp
    Tested-by: Thomas Wagner
    Link: https://lore.kernel.org/r/20201203140604.25488-2-socketcan@hartkopp.net
    Signed-off-by: Marc Kleine-Budde
    Link: https://lore.kernel.org/r/20201204133508.742120-3-mkl@pengutronix.de
    Signed-off-by: Jakub Kicinski

    Oliver Hartkopp
     

09 Dec, 2020

6 commits

  • Since commit 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF
    programs in net_device"), the XDP program attachment info is now maintained
    in the core code. This interacts badly with the xdp_attachment_flags_ok()
    check that prevents unloading an XDP program with different load flags than
    it was loaded with. In practice, two kinds of failures are seen:

    - An XDP program loaded without specifying a mode (and which then ends up
    in driver mode) cannot be unloaded if the program mode is specified on
    unload.

    - The dev_xdp_uninstall() hook always calls the driver callback with the
    mode set to the type of the program but an empty flags argument, which
    means the flags_ok() check prevents the program from being removed,
    leading to bpf prog reference leaks.

    The original reason this check was added was to avoid ambiguity when
    multiple programs were loaded. With the way the checks are done in the core
    now, this is quite simple to enforce in the core code, so let's add a check
    there and get rid of the xdp_attachment_flags_ok() callback entirely.

    Fixes: 7f0a838254bd ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
    Signed-off-by: Toke Høiland-Jørgensen
    Signed-off-by: Daniel Borkmann
    Acked-by: Jakub Kicinski
    Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk

    Toke Høiland-Jørgensen
     
  • Since commit 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id
    hash calculation") the ct id will not change from initialization to
    confirmation. Removing the confirmation check allows for things like
    adding an element to a 'typeof ct id' set in prerouting upon reception
    of the first packet of a new connection, and then being able to
    reference that set consistently both before and after the connection
    is confirmed.

    Fixes: 656c8e9cc1ba ("netfilter: conntrack: Use consistent ct id hash calculation")
    Signed-off-by: Brett Mastbergen
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Brett Mastbergen
     
  • Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
    small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.

    This is no longer the case, and Hazem Mohamed Abuelfotoh reported
    that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.

    Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
    of rcvq_space.space computation, while it can select later a smaller
    value for tp->rcv_ssthresh and tp->window_clamp.

    ss -temoi on receiver would show :

    skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596

    This means that TCP can not increase its window in tcp_grow_window(),
    and that DRS can never kick.

    Fix this by making sure that rcvq_space.space is not bigger than number of bytes
    that can be held in TCP receive queue.

    People unable/unwilling to change their kernel can work around this issue by
    selecting a bigger tcp_rmem[1] value as in :

    echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem

    Based on an initial report and patch from Hazem Mohamed Abuelfotoh
    https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/

    Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
    Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
    Reported-by: Hazem Mohamed Abuelfotoh
    Signed-off-by: Eric Dumazet
    Acked-by: Soheil Hassas Yeganeh
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • `tipc_node_apply_property` does a null check on a `tipc_link_entry`
    pointer but also accesses the same pointer out of the null check block.

    This triggers a warning on Coverity Static Analyzer because we're
    implying that `e->link` can BE null.

    Move "Update MTU for node link entry" line into if block to make sure
    that we're not in a state that `e->link` is null.

    Signed-off-by: Cengiz Can
    Signed-off-by: David S. Miller

    Cengiz Can
     
  • Add an explicit comment in the code to describe the indirect
    serialization of the holders of the commit_mutex with the rtnl_mutex.
    Commit 90d2723c6d4c ("netfilter: nf_tables: do not hold reference on
    netdevice from preparation phase") already describes this, but a comment
    in this case is better for reference.

    Reported-by: Vladimir Oltean
    Reviewed-by: Vladimir Oltean
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Use nf_msecs_to_jiffies64 and nf_jiffies64_to_msecs as provided by
    8e1102d5a159 ("netfilter: nf_tables: support timeouts larger than 23
    days"), otherwise ruleset listing breaks.

    Fixes: a8b1e36d0d1d ("netfilter: nft_dynset: fix element timeout for HZ != 1000")
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

08 Dec, 2020

6 commits

  • When running concurrent iptables rules replacement with data, the per CPU
    sequence count is checked after the assignment of the new information.
    The sequence count is used to synchronize with the packet path without the
    use of any explicit locking. If there are any packets in the packet path using
    the table information, the sequence count is incremented to an odd value and
    is incremented to an even after the packet process completion.

    The new table value assignment is followed by a write memory barrier so every
    CPU should see the latest value. If the packet path has started with the old
    table information, the sequence counter will be odd and the iptables
    replacement will wait till the sequence count is even prior to freeing the
    old table info.

    However, this assumes that the new table information assignment and the memory
    barrier is actually executed prior to the counter check in the replacement
    thread. If CPU decides to execute the assignment later as there is no user of
    the table information prior to the sequence check, the packet path in another
    CPU may use the old table information. The replacement thread would then free
    the table information under it leading to a use after free in the packet
    processing context-

    Unable to handle kernel NULL pointer dereference at virtual
    address 000000000000008e
    pc : ip6t_do_table+0x5d0/0x89c
    lr : ip6t_do_table+0x5b8/0x89c
    ip6t_do_table+0x5d0/0x89c
    ip6table_filter_hook+0x24/0x30
    nf_hook_slow+0x84/0x120
    ip6_input+0x74/0xe0
    ip6_rcv_finish+0x7c/0x128
    ipv6_rcv+0xac/0xe4
    __netif_receive_skb+0x84/0x17c
    process_backlog+0x15c/0x1b8
    napi_poll+0x88/0x284
    net_rx_action+0xbc/0x23c
    __do_softirq+0x20c/0x48c

    This could be fixed by forcing instruction order after the new table
    information assignment or by switching to RCU for the synchronization.

    Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
    Reported-by: Sean Tranchetti
    Reported-by: kernel test robot
    Suggested-by: Florian Westphal
    Signed-off-by: Subash Abhinov Kasiviswanathan
    Signed-off-by: Pablo Neira Ayuso

    Subash Abhinov Kasiviswanathan
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2020-12-07

    1) Sysbot reported fixes for the new 64/32 bit compat layer.
    From Dmitry Safonov.

    2) Fix a memory leak in xfrm_user_policy that was introduced
    by adding the 64/32 bit compat layer. From Yu Kuai.

    * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
    net: xfrm: fix memory leak in xfrm_user_policy()
    xfrm/compat: Don't allocate memory with __GFP_ZERO
    xfrm/compat: memset(0) 64-bit padding at right place
    xfrm/compat: Translate by copying XFRMA_UNSPEC attribute
    ====================

    Link: https://lore.kernel.org/r/20201207093937.2874932-1-steffen.klassert@secunet.com
    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • When do cat /proc/net/netstat, the output isn't append with a new line, it looks like this:
    [root@localhost ~]# cat /proc/net/netstat
    ...
    MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0[root@localhost ~]#

    This is because in mptcp_seq_show(), if mptcp isn't in use, net->mib.mptcp_statistics is NULL,
    so it just puts all 0 after "MPTcpExt:", and return, forgot the '\n'.

    After this patch:

    [root@localhost ~]# cat /proc/net/netstat
    ...
    MPTcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0
    [root@localhost ~]#

    Fixes: fc518953bc9c8d7d ("mptcp: add and use MIB counter infrastructure")
    Signed-off-by: Jianguo Wu
    Acked-by: Florian Westphal
    Link: https://lore.kernel.org/r/142e2fd9-58d9-bb13-fb75-951cccc2331e@163.com
    Signed-off-by: Jakub Kicinski

    Jianguo Wu
     
  • When enabling multicast snooping, bridge module deadlocks on multicast_lock
    if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
    network.

    The deadlock was caused by the following sequence: While holding the lock,
    br_multicast_open calls br_multicast_join_snoopers, which eventually causes
    IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
    Since the destination Ethernet address is a multicast address, br_dev_xmit
    feeds the packet back to the bridge via br_multicast_rcv, which in turn
    calls br_multicast_add_group, which then deadlocks on multicast_lock.

    The fix is to move the call br_multicast_join_snoopers outside of the
    critical section. This works since br_multicast_join_snoopers only deals
    with IP and does not modify any multicast data structures of the bridge,
    so there's no need to hold the lock.

    Steps to reproduce:
    1. sysctl net.ipv6.conf.all.force_mld_version=1
    2. have another querier
    3. ip link set dev bridge type bridge mcast_snooping 0 && \
    ip link set dev bridge type bridge mcast_snooping 1 < deadlock >

    A typical call trace looks like the following:

    [ 936.251495] _raw_spin_lock+0x5c/0x68
    [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge]
    [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge]
    [ 936.265322] br_dev_xmit+0x140/0x368 [bridge]
    [ 936.269689] dev_hard_start_xmit+0x94/0x158
    [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8
    [ 936.277890] dev_queue_xmit+0x10/0x18
    [ 936.281563] neigh_resolve_output+0xec/0x198
    [ 936.285845] ip6_finish_output2+0x240/0x710
    [ 936.290039] __ip6_finish_output+0x130/0x170
    [ 936.294318] ip6_output+0x6c/0x1c8
    [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8
    [ 936.301834] igmp6_send+0x358/0x558
    [ 936.305326] igmp6_join_group.part.0+0x30/0xf0
    [ 936.309774] igmp6_group_added+0xfc/0x110
    [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290
    [ 936.317885] ipv6_dev_mc_inc+0x10/0x18
    [ 936.321677] br_multicast_open+0xbc/0x110 [bridge]
    [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge]

    Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
    Signed-off-by: Joseph Huang
    Acked-by: Nikolay Aleksandrov
    Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com
    Signed-off-by: Jakub Kicinski

    Joseph Huang
     
  • migrate_disable() is just a wrapper for preempt_disable() in
    non-RT kernel. It is safe to replace it, and RT kernel will
    benefit.

    Note that it is introduced since Feb 2020.

    Suggested-by: Alexei Starovoitov
    Signed-off-by: Cong Wang
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20201205075946.497763-2-xiyou.wangcong@gmail.com

    Cong Wang
     
  • The per-cpu bpf_redirect_info is shared among all skb_do_redirect()
    and BPF redirect helpers. Callers on RX path are all in BH context,
    disabling preemption is not sufficient to prevent BH interruption.

    In production, we observed strange packet drops because of the race
    condition between LWT xmit and TC ingress, and we verified this issue
    is fixed after we disable BH.

    Although this bug was technically introduced from the beginning, that
    is commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure"),
    at that time call_rcu() had to be call_rcu_bh() to match the RCU context.
    So this patch may not work well before RCU flavor consolidation has been
    completed around v5.0.

    Update the comments above the code too, as call_rcu() is now BH friendly.

    Signed-off-by: Dongdong Wang
    Signed-off-by: Alexei Starovoitov
    Reviewed-by: Cong Wang
    Link: https://lore.kernel.org/bpf/20201205075946.497763-1-xiyou.wangcong@gmail.com

    Dongdong Wang
     

07 Dec, 2020

1 commit

  • Guillaume noticed that: for segments udp_queue_rcv_one_skb() returns the
    proto, and it should pass "ret" unmodified to ip_protocol_deliver_rcu().
    Otherwize, with a negtive value passed, it will underflow inet_protos.

    This can be reproduced with IPIP FOU:

    # ip fou add port 5555 ipproto 4
    # ethtool -K eth1 rx-gro-list on

    Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
    Reported-by: Guillaume Nault
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

05 Dec, 2020

5 commits

  • If tbl_mpp can not be allocated, we call mesh_table_free(tbl_path)
    while tbl_path rhashtable has not yet been initialized, which causes
    panics.

    Simply factorize the rhashtable_init() call into mesh_table_alloc()

    WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __flush_work kernel/workqueue.c:3040 [inline]
    WARNING: CPU: 1 PID: 8474 at kernel/workqueue.c:3040 __cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136
    Modules linked in:
    CPU: 1 PID: 8474 Comm: syz-executor663 Not tainted 5.10.0-rc6-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__flush_work kernel/workqueue.c:3040 [inline]
    RIP: 0010:__cancel_work_timer+0x514/0x540 kernel/workqueue.c:3136
    Code: 5d c3 e8 bf ae 29 00 0f 0b e9 f0 fd ff ff e8 b3 ae 29 00 0f 0b 43 80 3c 3e 00 0f 85 31 ff ff ff e9 34 ff ff ff e8 9c ae 29 00 0b e9 dc fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 7d fd ff
    RSP: 0018:ffffc9000165f5a0 EFLAGS: 00010293
    RAX: ffffffff814b7064 RBX: 0000000000000001 RCX: ffff888021c80000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffff888024039ca0 R08: dffffc0000000000 R09: fffffbfff1dd3e64
    R10: fffffbfff1dd3e64 R11: 0000000000000000 R12: 1ffff920002cbebd
    R13: ffff888024039c88 R14: 1ffff11004807391 R15: dffffc0000000000
    FS: 0000000001347880(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000140 CR3: 000000002cc0a000 CR4: 00000000001506e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    rhashtable_free_and_destroy+0x25/0x9c0 lib/rhashtable.c:1137
    mesh_table_free net/mac80211/mesh_pathtbl.c:69 [inline]
    mesh_pathtbl_init+0x287/0x2e0 net/mac80211/mesh_pathtbl.c:785
    ieee80211_mesh_init_sdata+0x2ee/0x530 net/mac80211/mesh.c:1591
    ieee80211_setup_sdata+0x733/0xc40 net/mac80211/iface.c:1569
    ieee80211_if_add+0xd5c/0x1cd0 net/mac80211/iface.c:1987
    ieee80211_add_iface+0x59/0x130 net/mac80211/cfg.c:125
    rdev_add_virtual_intf net/wireless/rdev-ops.h:45 [inline]
    nl80211_new_interface+0x563/0xb40 net/wireless/nl80211.c:3855
    genl_family_rcv_msg_doit net/netlink/genetlink.c:739 [inline]
    genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
    genl_rcv_msg+0xe4e/0x1280 net/netlink/genetlink.c:800
    netlink_rcv_skb+0x190/0x3a0 net/netlink/af_netlink.c:2494
    genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
    netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
    netlink_unicast+0x780/0x930 net/netlink/af_netlink.c:1330
    netlink_sendmsg+0x9a8/0xd40 net/netlink/af_netlink.c:1919
    sock_sendmsg_nosec net/socket.c:651 [inline]
    sock_sendmsg net/socket.c:671 [inline]
    ____sys_sendmsg+0x519/0x800 net/socket.c:2353
    ___sys_sendmsg net/socket.c:2407 [inline]
    __sys_sendmsg+0x2b1/0x360 net/socket.c:2440
    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: 60854fd94573 ("mac80211: mesh: convert path table to rhashtable")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Reviewed-by: Johannes Berg
    Link: https://lore.kernel.org/r/20201204162428.2583119-1-eric.dumazet@gmail.com
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • Fix to return a negative error code from the error handling
    case instead of 0, as done elsewhere in this function.

    Changing 'return start' to 'return action_start' can fix this bug.

    Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format")
    Reported-by: Hulk Robot
    Signed-off-by: Wang Hai
    Reviewed-by: Eelco Chaudron
    Link: https://lore.kernel.org/r/20201204114314.1596-1-wanghai38@huawei.com
    Signed-off-by: Jakub Kicinski

    Wang Hai
     
  • Fix to return a negative error code from the error handling
    case instead of 0, as done elsewhere in this function.

    Fixes: f8ed289fab84 ("bridge: vlan: use br_vlan_(get|put)_master to deal with refcounts")
    Reported-by: Hulk Robot
    Signed-off-by: Zhang Changzhong
    Acked-by: Nikolay Aleksandrov
    Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com
    Signed-off-by: Jakub Kicinski

    Zhang Changzhong
     
  • Fix to return a negative error code from the error handling
    case instead of 0, as done elsewhere in this function.

    Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes")
    Reported-by: Hulk Robot
    Signed-off-by: Zhang Changzhong
    Reviewed-by: David Ahern
    Link: https://lore.kernel.org/r/1607071695-33740-1-git-send-email-zhangchangzhong@huawei.com
    Signed-off-by: Jakub Kicinski

    Zhang Changzhong
     
  • with the following tdc testcase:

    83be: (qdisc, fq_pie) Create FQ-PIE with invalid number of flows

    as fq_pie_init() fails, fq_pie_destroy() is called to clean up. Since the
    timer is not yet initialized, it's possible to observe a splat like this:

    INFO: trying to register non-static key.
    the code is fine but needs lockdep annotation.
    turning off the locking correctness validator.
    CPU: 0 PID: 975 Comm: tc Not tainted 5.10.0-rc4+ #298
    Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
    Call Trace:
    dump_stack+0x99/0xcb
    register_lock_class+0x12dd/0x1750
    __lock_acquire+0xfe/0x3970
    lock_acquire+0x1c8/0x7f0
    del_timer_sync+0x49/0xd0
    fq_pie_destroy+0x3f/0x80 [sch_fq_pie]
    qdisc_create+0x916/0x1160
    tc_modify_qdisc+0x3c4/0x1630
    rtnetlink_rcv_msg+0x346/0x8e0
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [...]
    ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0
    WARNING: CPU: 0 PID: 975 at lib/debugobjects.c:508 debug_print_object+0x162/0x210
    [...]
    Call Trace:
    debug_object_assert_init+0x268/0x380
    try_to_del_timer_sync+0x6a/0x100
    del_timer_sync+0x9e/0xd0
    fq_pie_destroy+0x3f/0x80 [sch_fq_pie]
    qdisc_create+0x916/0x1160
    tc_modify_qdisc+0x3c4/0x1630
    rtnetlink_rcv_msg+0x346/0x8e0
    netlink_rcv_skb+0x120/0x380
    netlink_unicast+0x439/0x630
    netlink_sendmsg+0x719/0xbf0
    sock_sendmsg+0xe2/0x110
    ____sys_sendmsg+0x5ba/0x890
    ___sys_sendmsg+0xe9/0x160
    __sys_sendmsg+0xd3/0x170
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    fix it moving timer_setup() before any failure, like it was done on 'red'
    with former commit 608b4adab178 ("net_sched: initialize timer earlier in
    red_init()").

    Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
    Signed-off-by: Davide Caratti
    Reviewed-by: Cong Wang
    Link: https://lore.kernel.org/r/2e78e01c504c633ebdff18d041833cf2e079a3a4.1607020450.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     

04 Dec, 2020

5 commits

  • If force_zc is set, we should exit out with an error, not fall back to
    copy mode.

    Fixes: 921b68692abb ("xsk: Enable sharing of dma mappings")
    Reported-by: Hulk Robot
    Signed-off-by: Zhang Changzhong
    Signed-off-by: Daniel Borkmann
    Acked-by: Magnus Karlsson
    Link: https://lore.kernel.org/bpf/1607077277-41995-1-git-send-email-zhangchangzhong@huawei.com

    Zhang Changzhong
     
  • During restarrt, mac80211 is supposed to reconfigure the driver.
    When there's a monitor interface, the interface is added and the
    channel context for it was created, but not assigned to it as it
    was not considered running during the restart.

    Fix this by setting SDATA_STATE_RUNNING while adding monitor
    interfaces.

    Signed-off-by: Borwankar, Antara
    Signed-off-by: Luca Coelho
    Link: https://lore.kernel.org/r/iwlwifi.20201129172929.e1df99693a4c.I494579f28018c2d0b9d4083a664cf872c28405ae@changeid
    [reword commit log]
    Signed-off-by: Johannes Berg

    Borwankar, Antara
     
  • In case we have old supplicant, the akm field is uninitialized.

    Signed-off-by: Sara Sharon
    Signed-off-by: Luca Coelho
    Link: https://lore.kernel.org/r/iwlwifi.20201129172929.930f0ab7ebee.Ic546e384efab3f4a89f318eafddc3eb7d556aecb@changeid
    Signed-off-by: Johannes Berg

    Sara Sharon
     
  • ieee80211_chandef_he_6ghz_oper() needs to return true if it
    determined a value 6 GHz chandef, fix that.

    Fixes: 1d00ce807efa ("mac80211: support S1G association")
    Signed-off-by: Wen Gong
    Link: https://lore.kernel.org/r/1606121152-3452-1-git-send-email-wgong@codeaurora.org
    [rewrite commit message]
    Signed-off-by: Johannes Berg

    Wen Gong
     
  • when 'act_mpls' is used to mangle the LSE, the current value is read from
    the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is
    contained in the skb "linear" area.

    Found by code inspection.

    v2:
    - use MPLS_HLEN instead of sizeof(new_lse), thanks to Jakub Kicinski

    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Signed-off-by: Davide Caratti
    Acked-by: Guillaume Nault
    Link: https://lore.kernel.org/r/3243506cba43d14858f3bd21ee0994160e44d64a.1606987058.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti