28 Aug, 2020

1 commit

  • David Howells says:

    ====================
    rxrpc, afs: Fix probing issues

    Here are some fixes for rxrpc and afs to fix issues in the RTT measuring in
    rxrpc and thence the Volume Location server probing in afs:

    (1) Move the serial number of a received ACK into a local variable to
    simplify the next patch.

    (2) Fix the loss of RTT samples due to extra interposed ACKs causing
    baseline information to be discarded too early. This is a particular
    problem for afs when it sends a single very short call to probe a
    server it hasn't talked to recently.

    (3) Fix rxrpc_kernel_get_srtt() to indicate whether it actually has seen
    any valid samples or not.

    (4) Remove a field that's set/woken, but never read/waited on.

    (5) Expose the RTT and other probe information through procfs to make
    debugging of this stuff easier.

    (6) Fix VL rotation in afs to only use summary information from VL probing
    and not the probe running state (which gets clobbered when next a
    probe is issued).

    (7) Fix VL rotation to actually return the error aggregated from the probe
    errors.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Aug, 2020

1 commit

  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ndisc_ifinfo_sysctl_change to take a kernel pointer. Adjust its
    prototype in net/ndisc.h as well to fix the following sparse warning:

    net/ipv6/ndisc.c:1838:5: error: symbol 'ndisc_ifinfo_sysctl_change' redeclared with different type (incompatible argument 3 (different address spaces)):
    net/ipv6/ndisc.c:1838:5: int extern [addressable] [signed] [toplevel] ndisc_ifinfo_sysctl_change( ... )
    net/ipv6/ndisc.c: note: in included file (through include/net/ipv6.h):
    ./include/net/ndisc.h:496:5: note: previously declared as:
    ./include/net/ndisc.h:496:5: int extern [addressable] [signed] [toplevel] ndisc_ifinfo_sysctl_change( ... )
    net/ipv6/ndisc.c: note: in included file (through include/net/ip6_route.h):

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Cc: Christoph Hellwig
    Signed-off-by: Tobias Klauser
    Signed-off-by: David S. Miller

    Tobias Klauser
     

22 Aug, 2020

1 commit

  • Following bug was reported via irc:
    nft list ruleset
    set knock_candidates_ipv4 {
    type ipv4_addr . inet_service
    size 65535
    elements = { 127.0.0.1 . 123,
    127.0.0.1 . 123 }
    }
    ..
    udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 }
    udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport }

    It should not have been possible to add a duplicate set entry.

    After some debugging it turned out that the problem is the immediate
    value (123) in the second-to-last rule.

    Concatenations use 32bit registers, i.e. the elements are 8 bytes each,
    not 6 and it turns out the kernel inserted

    inet firewall @knock_candidates_ipv4
    element 0100007f ffff7b00 : 0 [end]
    element 0100007f 00007b00 : 0 [end]

    Note the non-zero upper bits of the first element. It turns out that
    nft_immediate doesn't zero the destination register, but this is needed
    when the length isn't a multiple of 4.

    Furthermore, the zeroing in nft_payload is broken. We can't use
    [len / 4] = 0 -- if len is a multiple of 4, index is off by one.

    Skip zeroing in this case and use a conditional instead of (len -1) / 4.

    Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

21 Aug, 2020

1 commit


19 Aug, 2020

1 commit

  • This patch is to do 3 things for ipv6_dev_find():

    As David A. noticed,

    - rt6_lookup() is not really needed. Different from __ip_dev_find(),
    ipv6_dev_find() doesn't have a compatibility problem, so remove it.

    As Hideaki suggested,

    - "valid" (non-tentative) check for the address is also needed.
    ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will
    traverse the address hash list, but it's heavy to be called
    inside ipv6_dev_find(). This patch is to reuse the code of
    ipv6_chk_addr_and_flags() for ipv6_dev_find().

    - dev parameter is passed into ipv6_dev_find(), as link-local
    addresses from user space has sin6_scope_id set and the dev
    lookup needs it.

    Fixes: 81f6cb31222d ("ipv6: add ipv6_dev_find()")
    Suggested-by: YOSHIFUJI Hideaki
    Reported-by: David Ahern
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller

    Xin Long
     

14 Aug, 2020

1 commit

  • Pull networking fixes from David Miller:
    "Some merge window fallout, some longer term fixes:

    1) Handle headroom properly in lapbether and x25_asy drivers, from
    Xie He.

    2) Fetch MAC address from correct r8152 device node, from Thierry
    Reding.

    3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg,
    from Rouven Czerwinski.

    4) Correct fdputs in socket layer, from Miaohe Lin.

    5) Revert troublesome sockptr_t optimization, from Christoph Hellwig.

    6) Fix TCP TFO key reading on big endian, from Jason Baron.

    7) Missing CAP_NET_RAW check in nfc, from Qingyu Li.

    8) Fix inet fastreuse optimization with tproxy sockets, from Tim
    Froidcoeur.

    9) Fix 64-bit divide in new SFC driver, from Edward Cree.

    10) Add a tracepoint for prandom_u32 so that we can more easily
    perform usage analysis. From Eric Dumazet.

    11) Fix rwlock imbalance in AF_PACKET, from John Ogness"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
    net: openvswitch: introduce common code for flushing flows
    af_packet: TPACKET_V3: fix fill status rwlock imbalance
    random32: add a tracepoint for prandom_u32()
    Revert "ipv4: tunnel: fix compilation on ARCH=um"
    net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus
    net: ethernet: stmmac: Disable hardware multicast filter
    net: stmmac: dwmac1000: provide multicast filter fallback
    ipv4: tunnel: fix compilation on ARCH=um
    vsock: fix potential null pointer dereference in vsock_poll()
    sfc: fix ef100 design-param checking
    net: initialize fastreuse on inet_inherit_port
    net: refactor bind_bucket fastreuse into helper
    net: phy: marvell10g: fix null pointer dereference
    net: Fix potential memory leak in proto_register()
    net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
    ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()
    net/nfc/rawsock.c: add CAP_NET_RAW check.
    hinic: fix strncpy output truncated compile warnings
    drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check
    net/tls: Fix kmap usage
    ...

    Linus Torvalds
     

12 Aug, 2020

1 commit


11 Aug, 2020

2 commits

  • Pull locking updates from Thomas Gleixner:
    "A set of locking fixes and updates:

    - Untangle the header spaghetti which causes build failures in
    various situations caused by the lockdep additions to seqcount to
    validate that the write side critical sections are non-preemptible.

    - The seqcount associated lock debug addons which were blocked by the
    above fallout.

    seqcount writers contrary to seqlock writers must be externally
    serialized, which usually happens via locking - except for strict
    per CPU seqcounts. As the lock is not part of the seqcount, lockdep
    cannot validate that the lock is held.

    This new debug mechanism adds the concept of associated locks.
    sequence count has now lock type variants and corresponding
    initializers which take a pointer to the associated lock used for
    writer serialization. If lockdep is enabled the pointer is stored
    and write_seqcount_begin() has a lockdep assertion to validate that
    the lock is held.

    Aside of the type and the initializer no other code changes are
    required at the seqcount usage sites. The rest of the seqcount API
    is unchanged and determines the type at compile time with the help
    of _Generic which is possible now that the minimal GCC version has
    been moved up.

    Adding this lockdep coverage unearthed a handful of seqcount bugs
    which have been addressed already independent of this.

    While generally useful this comes with a Trojan Horse twist: On RT
    kernels the write side critical section can become preemtible if
    the writers are serialized by an associated lock, which leads to
    the well known reader preempts writer livelock. RT prevents this by
    storing the associated lock pointer independent of lockdep in the
    seqcount and changing the reader side to block on the lock when a
    reader detects that a writer is in the write side critical section.

    - Conversion of seqcount usage sites to associated types and
    initializers"

    * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
    locking/seqlock, headers: Untangle the spaghetti monster
    locking, arch/ia64: Reduce header dependencies by moving XTP bits into the new header
    x86/headers: Remove APIC headers from
    seqcount: More consistent seqprop names
    seqcount: Compress SEQCNT_LOCKNAME_ZERO()
    seqlock: Fold seqcount_LOCKNAME_init() definition
    seqlock: Fold seqcount_LOCKNAME_t definition
    seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
    hrtimer: Use sequence counter with associated raw spinlock
    kvm/eventfd: Use sequence counter with associated spinlock
    userfaultfd: Use sequence counter with associated spinlock
    NFSv4: Use sequence counter with associated spinlock
    iocost: Use sequence counter with associated spinlock
    raid5: Use sequence counter with associated spinlock
    vfs: Use sequence counter with associated spinlock
    timekeeping: Use sequence counter with associated raw spinlock
    xfrm: policy: Use sequence counters with associated lock
    netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
    netfilter: conntrack: Use sequence counter with associated spinlock
    sched: tasks: Use sequence counter with associated spinlock
    ...

    Linus Torvalds
     
  • When TFO keys are read back on big endian systems either via the global
    sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values
    don't match what was written.

    For example, on s390x:

    # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key
    # cat /proc/sys/net/ipv4/tcp_fastopen_key
    02000000-01000000-04000000-03000000

    Instead of:

    # cat /proc/sys/net/ipv4/tcp_fastopen_key
    00000001-00000002-00000003-00000004

    Fix this by converting to the correct endianness on read. This was
    reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net
    selftest on s390x, which depends on the read value matching what was
    written. I've confirmed that the test now passes on big and little endian
    systems.

    Signed-off-by: Jason Baron
    Fixes: 438ac88009bc ("net: fastopen: robustness and endianness fixes for SipHash")
    Cc: Ard Biesheuvel
    Cc: Eric Dumazet
    Reported-and-tested-by: Colin Ian King
    Signed-off-by: David S. Miller

    Jason Baron
     

07 Aug, 2020

1 commit

  • Pull dlm updates from David Teigland:
    "This set includes a some improvements to the dlm networking layer:
    improving the ability to trace dlm messages for debugging, and
    improved handling of bad messages or disrupted connections"

    * tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    fs: dlm: implement tcp graceful shutdown
    fs: dlm: change handling of reconnects
    fs: dlm: don't close socket on invalid message
    fs: dlm: set skb mark per peer socket
    fs: dlm: set skb mark for listen socket
    net: sock: add sock_set_mark
    dlm: Fix kobject memleak

    Linus Torvalds
     

06 Aug, 2020

3 commits

  • This patch adds a new socket helper function to set the mark value for a
    kernel socket.

    Signed-off-by: Alexander Aring
    Signed-off-by: David Teigland

    Alexander Aring
     
  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     
  • This is to add an ip_dev_find like function for ipv6, used to find
    the dev by saddr.

    It will be used by TIPC protocol. So also export it.

    Signed-off-by: Xin Long
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller

    Xin Long
     

05 Aug, 2020

4 commits

  • Pull seccomp updates from Kees Cook:
    "There are a bunch of clean ups and selftest improvements along with
    two major updates to the SECCOMP_RET_USER_NOTIF filter return:
    EPOLLHUP support to more easily detect the death of a monitored
    process, and being able to inject fds when intercepting syscalls that
    expect an fd-opening side-effect (needed by both container folks and
    Chrome). The latter continued the refactoring of __scm_install_fd()
    started by Christoph, and in the process found and fixed a handful of
    bugs in various callers.

    - Improved selftest coverage, timeouts, and reporting

    - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)

    - Refactor __scm_install_fd() into __receive_fd() and fix buggy
    callers

    - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
    Dhillon)"

    * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
    selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
    seccomp: Introduce addfd ioctl to seccomp user notifier
    fs: Expand __receive_fd() to accept existing fd
    pidfd: Replace open-coded receive_fd()
    fs: Add receive_fd() wrapper for __receive_fd()
    fs: Move __scm_install_fd() to __receive_fd()
    net/scm: Regularize compat handling of scm_detach_fds()
    pidfd: Add missing sock updates for pidfd_getfd()
    net/compat: Add missing sock updates for SCM_RIGHTS
    selftests/seccomp: Check ENOSYS under tracing
    selftests/seccomp: Refactor to use fixture variants
    selftests/harness: Clean up kern-doc for fixtures
    seccomp: Use -1 marker for end of mode 1 syscall list
    seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
    selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
    selftests/seccomp: Make kcmp() less required
    seccomp: Use pr_fmt
    selftests/seccomp: Improve calibration loop
    selftests/seccomp: use 90s as timeout
    selftests/seccomp: Expand benchmark to per-filter measurements
    ...

    Linus Torvalds
     
  • Pull uninitialized_var() macro removal from Kees Cook:
    "This is long overdue, and has hidden too many bugs over the years. The
    series has several "by hand" fixes, and then a trivial treewide
    replacement.

    - Clean up non-trivial uses of uninitialized_var()

    - Update documentation and checkpatch for uninitialized_var() removal

    - Treewide removal of uninitialized_var()"

    * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    compiler: Remove uninitialized_var() macro
    treewide: Remove uninitialized_var() usage
    checkpatch: Remove awareness of uninitialized_var() macro
    mm/debug_vm_pgtable: Remove uninitialized_var() usage
    f2fs: Eliminate usage of uninitialized_var() macro
    media: sur40: Remove uninitialized_var() usage
    KVM: PPC: Book3S PR: Remove uninitialized_var() usage
    clk: spear: Remove uninitialized_var() usage
    clk: st: Remove uninitialized_var() usage
    spi: davinci: Remove uninitialized_var() usage
    ide: Remove uninitialized_var() usage
    rtlwifi: rtl8192cu: Remove uninitialized_var() usage
    b43: Remove uninitialized_var() usage
    drbd: Remove uninitialized_var() usage
    x86/mm/numa: Remove uninitialized_var() usage
    docs: deprecated.rst: Add uninitialized_var()

    Linus Torvalds
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Flush the cleanup xtables worker to make sure destructors
    have completed, from Florian Westphal.

    2) iifgroup is matching erroneously, also from Florian.

    3) Add selftest for meta interface matching, from Florian Westphal.

    4) Move nf_ct_offload_timeout() to header, from Roi Dayan.

    5) Call nf_ct_offload_timeout() from flow_offload_add() to
    make sure garbage collection does not evict offloaded flow,
    from Roi Dayan.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • It's currently possible to bridge Ethernet tunnels carrying IP
    packets directly to external interfaces without assigning them
    addresses and routes on the bridged network itself: this is the case
    for UDP tunnels bridged with a standard bridge or by Open vSwitch.

    PMTU discovery is currently broken with those configurations, because
    the encapsulation effectively decreases the MTU of the link, and
    while we are able to account for this using PMTU discovery on the
    lower layer, we don't have a way to relay ICMP or ICMPv6 messages
    needed by the sender, because we don't have valid routes to it.

    On the other hand, as a tunnel endpoint, we can't fragment packets
    as a general approach: this is for instance clearly forbidden for
    VXLAN by RFC 7348, section 4.3:

    VTEPs MUST NOT fragment VXLAN packets. Intermediate routers may
    fragment encapsulated VXLAN packets due to the larger frame size.
    The destination VTEP MAY silently discard such VXLAN fragments.

    The same paragraph recommends that the MTU over the physical network
    accomodates for encapsulations, but this isn't a practical option for
    complex topologies, especially for typical Open vSwitch use cases.

    Further, it states that:

    Other techniques like Path MTU discovery (see [RFC1191] and
    [RFC1981]) MAY be used to address this requirement as well.

    Now, PMTU discovery already works for routed interfaces, we get
    route exceptions created by the encapsulation device as they receive
    ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
    we already rebuild those messages with the appropriate MTU and route
    them back to the sender.

    Add the missing bits for bridged cases:

    - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
    to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
    RFC 4443 section 2.4 for ICMPv6. This function is already called by
    UDP tunnels

    - a new function generating those ICMP or ICMPv6 replies. We can't
    reuse icmp_send() and icmp6_send() as we don't see the sender as a
    valid destination. This doesn't need to be generic, as we don't
    cover any other type of ICMP errors given that we only provide an
    encapsulation function to the sender

    While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
    we might receive GSO buffers here, and the passed headroom already
    includes the inner MAC length, so we don't have to account for it
    a second time (that would imply three MAC headers on the wire, but
    there are just two).

    This issue became visible while bridging IPv6 packets with 4500 bytes
    of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
    bytes of encapsulation headroom, we would advertise MTU as 3950, and
    we would reject fragmented IPv6 datagrams of 3958 bytes size on the
    wire. We're exclusively dealing with network MTU here, though, so we
    could get Ethernet frames up to 3964 octets in that case.

    v2:
    - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
    - split IPv4/IPv6 functions (David Ahern)

    Signed-off-by: Stefano Brivio
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stefano Brivio
     

04 Aug, 2020

7 commits

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2020-08-04

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 73 non-merge commits during the last 9 day(s) which contain
    a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).

    The main changes are:

    1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
    syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.

    2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
    in-kernel inspection, from Yonghong Song and Alexei Starovoitov.

    3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
    unwinder errors, from Song Liu.

    4) Allow cgroup local storage map to be shared between programs on the same
    cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.

    5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
    load instructions, from Jean-Philippe Brucker.

    6) Follow-up fixes on BPF socket lookup in combination with reuseport group
    handling. Also add related BPF selftests, from Jakub Sitnicki.

    7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
    socket create/release as well as bind functions, from Stanislav Fomichev.

    8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
    xdp_statistics, from Peilin Ye.

    9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.

    10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
    fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.

    11) Fix a bpftool segfault due to missing program type name and make it more robust
    to prevent them in future gaps, from Quentin Monnet.

    12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
    resolver issue, from John Fastabend.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Saeed Mahameed says:

    ====================
    mlx5-updates-2020-08-03

    This patchset introduces some updates to mlx5 driver.

    1) Jakub converts mlx5 to use the new udp tunnel infrastructure.
    Starting with a hack to allow drivers to request a static configuration
    of the default vxlan port, and then a patch that converts mlx5.

    2) Parav implements change_carrier ndo for VF eswitch representors,
    to speedup link state control of representors netdevices.

    3) Alex Vesker, makes a simple update to software steering to fix an issue
    with push vlan action sequence

    4) Leon removes a redundant dump stack on error flow.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • A later patch will refuse to set the action of certain traps in mlxsw
    and also to change the policer binding of certain groups. Pass extack so
    that failure could be communicated clearly to user space.

    Reviewed-by: Petr Machata
    Reviewed-by: Jiri Pirko
    Signed-off-by: Petr Machata
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Add the packet trap that can report packets that were ECN marked due to RED
    AQM.

    Signed-off-by: Amit Cohen
    Signed-off-by: Petr Machata
    Reviewed-by: Jiri Pirko
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Amit Cohen
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    1) UAF in chain binding support from previous batch, from Dan Carpenter.

    2) Queue up delayed work to expire connections with no destination,
    from Andrew Sy Kim.

    3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva.

    4) Replace HTTP links with HTTPS, from Alexander A. Klimov.

    5) Remove superfluous null header checks in ip6tables, from
    Gaurav Singh.

    6) Add extended netlink error reporting for expression.

    7) Report EEXIST on overlapping chain, set elements and flowtable
    devices.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • When openvswitch conntrack offload with act_ct action. Fragment packets
    defrag in the ingress tc act_ct action and miss the next chain. Then the
    packet pass to the openvswitch datapath without the mru. The over
    mtu packet will be dropped in output action in openvswitch for over mtu.

    "kernel: net2: dropped over-mtu packet: 1528 > 1500"

    This patch add mru in the tc_skb_ext for adefrag and miss next chain
    situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
    to the qdisc_skb_cb when the packet defrag. And When the chain miss,
    The mru is set to tc_skb_ext which can be got by ovs datapath.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: wenxu
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    wenxu
     
  • mlx5 has the IANA VXLAN port (4789) hard coded by the device,
    instead of being added dynamically when tunnels are created.

    To support this add a workaround flag to struct udp_tunnel_nic_info.
    Skipping updates for the port is fairly trivial, dumping the hard
    coded port via ethtool requires some code duplication. The port
    is not a part of any real table, we dump it in a special table
    which has no tunnel types supported and only one entry.

    This is the last known workaround / hack needed to convert
    all drivers to the new infra.

    Signed-off-by: Jakub Kicinski
    Signed-off-by: Saeed Mahameed

    Jakub Kicinski
     

03 Aug, 2020

1 commit


02 Aug, 2020

1 commit


01 Aug, 2020

8 commits

  • …inux/kernel/git/jberg/mac80211-next

    Johannes Berg says:

    ====================
    We have a number of changes
    * code cleanups and fixups as usual
    * AQL & internal TXQ improvements from Felix
    * some mesh 802.1X support bits
    * some injection improvements from Mathy of KRACK
    fame, so we'll see what this results in ;-)
    * some more initial S1G supports bits, this time
    (some of?) the userspace APIs
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2020-07-31

    1) Fix policy matching with mark and mask on userspace interfaces.
    From Xin Long.

    2) Several fixes for the new ESP in TCP encapsulation.
    From Sabrina Dubroca.

    3) Fix crash when the hold queue is used. The assumption that
    xdst->path and dst->child are not a NULL pointer only if dst->xfrm
    is not a NULL pointer is true with the exception of using the
    hold queue. Fix this by checking for hold queue usage before
    dereferencing xdst->path or dst->child.

    4) Validate pfkey_dump parameter before sending them.
    From Mark Salyzyn.

    5) Fix the location of the transport header with ESP in UDPv6
    encapsulation. From Sabrina Dubroca.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • If SYN packet contains MP_CAPABLE option, keep it enabled.
    Syncokie validation and cookie-based socket creation is changed to
    instantiate an mptcp request sockets if the ACK contains an MPTCP
    connection request.

    Rather than extend both cookie_v4/6_check, add a common helper to create
    the (mp)tcp request socket.

    Suggested-by: Paolo Abeni
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Will be used to initialize the mptcp request socket when a MP_CAPABLE
    request was handled in syncookie mode, i.e. when a TCP ACK containing a
    MP_CAPABLE option is a valid syncookie value.

    Normally (non-cookie case), MPTCP will generate a unique 32 bit connection
    ID and stores it in the MPTCP token storage to be able to retrieve the
    mptcp socket for subflow joining.

    In syncookie case, we do not want to store any state, so just generate the
    unique ID and use it in the reply.

    This means there is a small window where another connection could generate
    the same token.

    When Cookie ACK comes back, we check that the token has not been registered
    in the mean time. If it was, the connection needs to fall back to TCP.

    Changes in v2:
    - use req->syncookie instead of passing 'want_cookie' arg to ->init_req()
    (Eric Dumazet)

    Signed-off-by: Florian Westphal
    Reviewed-by: Mat Martineau
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • syncookie code path needs to create an mptcp request sock.

    Prepare for this and add mptcp prefix plus needed export of ops struct.

    Signed-off-by: Florian Westphal
    Reviewed-by: Mat Martineau
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • Nowadays output function has a 'synack_type' argument that tells us when
    the syn/ack is emitted via syncookies.

    The request already tells us when timestamps are supported, so check
    both to detect special timestamp for tcp option encoding is needed.

    We could remove cookie_ts altogether, but a followup patch would
    otherwise need to adjust function signatures to pass 'want_cookie' to
    mptcp core.

    This way, the 'existing' bit can be used.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • …etooth/bluetooth-next

    Johan Hedberg says:

    ====================
    pull request: bluetooth-next 2020-07-31

    Here's the main bluetooth-next pull request for 5.9:

    - Fix firmware filenames for Marvell chipsets
    - Several suspend-related fixes
    - Addedd mgmt commands for runtime configuration
    - Multiple fixes for Qualcomm-based controllers
    - Add new monitoring feature for mgmt
    - Fix handling of legacy cipher (E4) together with security level 4
    - Add support for Realtek 8822CE controller
    - Fix issues with Chinese controllers using fake VID/PID values
    - Multiple other smaller fixes & improvements
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     
  • Pablo Neira found that after recent update of xt_IDLETIMER the
    iptables-nft tests sometimes show an error.

    He tracked this down to the delayed cleanup used by nf_tables core:
    del rule (transaction A)
    add rule (transaction B)

    Its possible that by time transaction B (both in same netns) runs,
    the xt target destructor has not been invoked yet.

    For native nft expressions this is no problem because all expressions
    that have such side effects make sure these are handled from the commit
    phase, rather than async cleanup.

    For nft_compat however this isn't true.

    Instead of forcing synchronous behaviour for nft_compat, keep track
    of the number of outstanding destructor calls.

    When we attempt to create a new expression, flush the cleanup worker
    to make sure destructors have completed.

    With lots of help from Pablo Neira.

    Reported-by: Pablo Neira Ayso
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

31 Jul, 2020

6 commits

  • This is in fact 'disabled' in the spec, but there it's in a
    place where that actually makes sense. In our internal data
    structures, it doesn't really make sense, and in fact the
    previous commit just fixed a bug in that area.

    Make this safer by inverting the polarity from 'disabled' to
    'enabled'.

    Link: https://lore.kernel.org/r/20200730130051.5d8399545bd9.Ie62fdcd1a6cd9c969315bc124084a494ca6c8df3@changeid
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • This can be used to run mac80211 rx processing on a batch of frames in NAPI
    poll before passing them to the network stack in a large batch.
    This can improve icache footprint, or it can be used to pass frames via
    netif_receive_skb_list.

    Signed-off-by: Felix Fietkau
    Link: https://lore.kernel.org/r/20200726110611.46886-1-nbd@nbd.name
    Signed-off-by: Johannes Berg

    Felix Fietkau
     
  • Already parse the radiotap header in ieee80211_monitor_select_queue.
    In a subsequent commit this will allow us to add a radiotap flag that
    influences the queue on which injected packets will be sent.

    This also fixes the incomplete validation of the injected frame in
    ieee80211_monitor_select_queue: currently an out of bounds memory
    access may occur in in the called function ieee80211_select_queue_80211
    if the 802.11 header is too small.

    Note that in ieee80211_monitor_start_xmit the radiotap header is parsed
    again, which is necessairy because ieee80211_monitor_select_queue is not
    always called beforehand.

    Signed-off-by: Mathy Vanhoef
    Link: https://lore.kernel.org/r/20200723100153.31631-6-Mathy.Vanhoef@kuleuven.be
    Signed-off-by: Johannes Berg

    Mathy Vanhoef
     
  • The radiotap specification contains a flag to indicate that the sequence
    number of an injected frame should not be overwritten. Parse this flag
    and define and set a corresponding Tx control flag.

    Signed-off-by: Mathy Vanhoef
    Link: https://lore.kernel.org/r/20200723100153.31631-2-Mathy.Vanhoef@kuleuven.be
    Signed-off-by: Johannes Berg

    Mathy Vanhoef
     
  • This avoids unnecessarily regenerating the skb flow hash

    Signed-off-by: Felix Fietkau
    Link: https://lore.kernel.org/r/20200726130947.88145-1-nbd@nbd.name
    [small commit message fixup]
    Signed-off-by: Johannes Berg

    Felix Fietkau
     
  • This patch adds the necessary bits to later query the auth server
    flag for every peer from iw.

    Signed-off-by: Markus Theil
    Link: https://lore.kernel.org/r/20200611140238.427461-2-markus.theil@tu-ilmenau.de
    Signed-off-by: Johannes Berg

    Markus Theil