05 Apr, 2018

1 commit

  • Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
    since linker might place next to it a non zero value preventing a change
    to ip6frag_low_thresh.

    ip6frag_low_thresh is not used anymore in the kernel, but we do not
    want to prematuraly break user scripts wanting to change it.

    Since specifying a minimal value of 0 for proc_doulongvec_minmax()
    is moot, let's remove these zero values in all defrag units.

    Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
    Signed-off-by: Eric Dumazet
    Reported-by: Maciej Żenczykowski
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Apr, 2018

10 commits

  • To fetch UID info for socket diagnostics, we determine the
    namespace of user context using tipc socket instance. This
    may cause namespace violation, as the kernel will remap based
    on UID.

    We fix this by fetching namespace info using the calling userspace
    netlink socket.

    Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC)
    Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com
    Acked-by: Jon Maloy
    Signed-off-by: GhantaKrishnamurthy MohanKrishna
    Signed-off-by: David S. Miller

    GhantaKrishnamurthy MohanKrishna
     
  • After commit 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates
    done by __ip_append_data()") and commit 1f4c6eb24029 ("ipv6:
    factorize sk_wmem_alloc updates done by __ip6_append_data()"),
    when transmitting sub MTU datagram, an addtional, unneeded atomic
    operation is performed in ip*_append_data() to update wmem_alloc:
    in the above condition the delta is 0.

    The above cause small but measurable performance regression in UDP
    xmit tput test with packet size below MTU.

    This change avoids such overhead updating wmem_alloc only if
    wmem_alloc_delta is non zero.

    The error path is left intentionally unmodified: it's a slow path
    and simplicity is preferred to performances.

    Fixes: 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()")
    Fixes: 1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()")
    Signed-off-by: Paolo Abeni
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • When an item of struct tipc_subscription is created, we fail to
    initialize the two lists aggregated into the struct. This has so far
    never been a problem, since the items are just added to a root
    object by list_add(), which does not require the addee list to be
    pre-initialized. However, syzbot is provoking situations where this
    addition fails, whereupon the attempted removal if the item from
    the list causes a crash.

    This problem seems to always have been around, despite that the code
    for creating this object was rewritten in commit 242e82cc95f6 ("tipc:
    collapse subscription creation functions"), which is still in net-next.

    We fix this for that commit by initializing the two lists properly.

    Fixes: 242e82cc95f6 ("tipc: collapse subscription creation functions")
    Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com
    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Maloy
     
  • A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow()
    and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending
    results to ICMPV6_PKT_TOOBIG error:

    udp_v6_send_skb(), for example with vti6 tunnel:
    vti6_xmit(), get ICMPV6_PKT_TOOBIG error
    skb_dst_update_pmtu(), can create a RTF_CACHE clone
    icmpv6_send()
    ...
    udpv6_err()
    ip6_sk_update_pmtu()
    ip6_update_pmtu(), can create a RTF_CACHE clone
    ...
    ip6_datagram_dst_update()
    ip6_dst_store()

    And after commit 33c162a980fe ("ipv6: datagram: Update dst cache of
    a connected datagram sk during pmtu update"), the UDPv6 error handler
    can update socket's dst cache, but it can happen before the update in
    the end of udpv6_sendmsg(), preventing getting the new dst cache on
    the next udpv6_sendmsg() calls.

    In order to fix it, save dst in a connected socket only if the current
    socket's dst cache is invalid.

    The previous patch prepared ip6_sk_dst_lookup_flow() to do that with
    the new argument, and this patch enables it in udpv6_sendmsg().

    Fixes: 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update")
    Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     
  • This should make it consistent with ip6_sk_dst_lookup_flow()
    that is accepting the new 'connected' parameter of type bool.

    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     
  • Add 'connected' parameter to ip6_sk_dst_lookup_flow() and update
    the cache only if ip6_sk_dst_check() returns NULL and a socket
    is connected.

    The function is used as before, the new behavior for UDP sockets
    in udpv6_sendmsg() will be enabled in the next patch.

    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     
  • Move commonly used pattern of ip6_dst_store() usage to a separate
    function - ip6_sk_dst_store_flow(), which will check the addresses
    for equality using the flow information, before saving them.

    There is no functional changes in this patch. In addition, it will
    be used in the next patch, in ip6_sk_dst_lookup_flow().

    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller

    Alexey Kodanev
     
  • After commit 581319c58600 ("net/socket: use per af lockdep classes for sk queues")
    sock queue locks now have per-af lockdep classes, including unix socket.
    It is no longer necessary to workaround it.

    I noticed this while looking at a syzbot deadlock report, this patch
    itself doesn't fix it (this is why I don't add Reported-by).

    Fixes: 581319c58600 ("net/socket: use per af lockdep classes for sk queues")
    Cc: Paolo Abeni
    Signed-off-by: Cong Wang
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Cong Wang
     
  • By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11
    should just be discarded rather than being aborted like other undefined
    packet types.

    Reported-by: Jeffrey Altman
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     
  • Pull networking updates from David Miller:

    1) Support offloading wireless authentication to userspace via
    NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.

    2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
    Setup and cleanup of namespaces now all run asynchronously and thus
    performance is significantly increased.

    3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
    Streiff.

    4) Support zerocopy on RDS sockets, from Sowmini Varadhan.

    5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
    Borkmann.

    6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
    Chevallier.

    7) Support grafting of child qdiscs in mlxsw driver, from Nogah
    Frankel.

    8) Add packet forwarding tests to selftests, from Ido Schimmel.

    9) Deal with sub-optimal GSO packets better in BBR congestion control,
    from Eric Dumazet.

    10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.

    11) Add path MTU tests to selftests, from Stefano Brivio.

    12) Various bits of IPSEC offloading support for mlx5, from Aviad
    Yehezkel, Yossi Kuperman, and Saeed Mahameed.

    13) Support RSS spreading on ntuple filters in SFC driver, from Edward
    Cree.

    14) Lots of sockmap work from John Fastabend. Applications can use eBPF
    to filter sendmsg and sendpage operations.

    15) In-kernel receive TLS support, from Dave Watson.

    16) Add XDP support to ixgbevf, this is significant because it should
    allow optimized XDP usage in various cloud environments. From Tony
    Nguyen.

    17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
    Venkataramanan et al.

    18) IP fragmentation match offload support in nfp driver, from Pieter
    Jansen van Vuuren.

    19) Support XDP redirect in i40e driver, from Björn Töpel.

    20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
    tracepoints in their raw form, from Alexei Starovoitov.

    21) Lots of striding RQ improvements to mlx5 driver with many
    performance improvements, from Tariq Toukan.

    22) Use rhashtable for inet frag reassembly, from Eric Dumazet.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
    net: mvneta: improve suspend/resume
    net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
    ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
    net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
    net: bgmac: Correctly annotate register space
    route: check sysctl_fib_multipath_use_neigh earlier than hash
    fix typo in command value in drivers/net/phy/mdio-bitbang.
    sky2: Increase D3 delay to sky2 stops working after suspend
    net/mlx5e: Set EQE based as default TX interrupt moderation mode
    ibmvnic: Disable irqs before exiting reset from closed state
    net: sched: do not emit messages while holding spinlock
    vlan: also check phy_driver ts_info for vlan's real device
    Bluetooth: Mark expected switch fall-throughs
    Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
    Bluetooth: btrsi: remove unused including
    Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
    sh_eth: kill useless check in __sh_eth_get_regs()
    sh_eth: add sh_eth_cpu_data::no_xdfar flag
    ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
    ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
    ...

    Linus Torvalds
     

03 Apr, 2018

23 commits

  • Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
    "System calls are interaction points between userspace and the kernel.
    Therefore, system call functions such as sys_xyzzy() or
    compat_sys_xyzzy() should only be called from userspace via the
    syscall table, but not from elsewhere in the kernel.

    At least on 64-bit x86, it will likely be a hard requirement from
    v4.17 onwards to not call system call functions in the kernel: It is
    better to use use a different calling convention for system calls
    there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
    which then hands processing over to the actual syscall function. This
    means that only those parameters which are actually needed for a
    specific syscall are passed on during syscall entry, instead of
    filling in six CPU registers with random user space content all the
    time (which may cause serious trouble down the call chain). Those
    x86-specific patches will be pushed through the x86 tree in the near
    future.

    Moreover, rules on how data may be accessed may differ between kernel
    data and user data. This is another reason why calling sys_xyzzy() is
    generally a bad idea, and -- at most -- acceptable in arch-specific
    code.

    This patchset removes all in-kernel calls to syscall functions in the
    kernel with the exception of arch/. On top of this, it cleans up the
    three places where many syscalls are referenced or prototyped, namely
    kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"

    * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
    bpf: whitelist all syscalls for error injection
    kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
    kernel/sys_ni: sort cond_syscall() entries
    syscalls/x86: auto-create compat_sys_*() prototypes
    syscalls: sort syscall prototypes in include/linux/compat.h
    net: remove compat_sys_*() prototypes from net/compat.h
    syscalls: sort syscall prototypes in include/linux/syscalls.h
    kexec: move sys_kexec_load() prototype to syscalls.h
    x86/sigreturn: use SYSCALL_DEFINE0
    x86: fix sys_sigreturn() return type to be long, not unsigned long
    x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
    mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
    mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
    mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
    fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
    fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
    fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
    fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
    kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
    kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
    ...

    Linus Torvalds
     
  • Using the net-internal helpers __compat_sys_...msg() allows us to avoid
    the internal calls to the compat_sys_...msg() syscalls.
    compat_sys_recvmmsg() is handled in a different patch.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __compat_sys_recvmmsg() allows us to avoid
    the internal calls to the compat_sys_recvmmsg() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __compat_sys_getsockopt() allows us to avoid
    the internal calls to the compat_sys_getsockopt() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __compat_sys_setsockopt() allows us to avoid
    the internal calls to the compat_sys_setsockopt() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __compat_sys_recvfrom() allows us to avoid
    the internal calls to the compat_sys_recvfrom() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • sys_recv() merely expands the parameters to __sys_recvfrom() by NULL and
    NULL. Open-code this in the two places which used sys_recv() as a wrapper
    to __sys_recvfrom().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • sys_send() merely expands the parameters to __sys_sendto() by NULL and 0.
    Open-code this in the two places which used sys_send() as a wrapper to
    __sys_sendto().

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • The non-compat codepaths for sys_...msg() verify that MSG_CMSG_COMPAT
    is not set. By moving this check to the __sys_...msg() functions
    (and making it dependent on a static flag passed to this function), we
    can call the __sys...msg() functions instead of the syscall functions
    in all cases. __sys_recvmmsg() does not need this trickery, as the
    check is handled within the do_sys_recvmmsg() function internal to
    net/socket.c.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper do_sys_recvmmsg() allows us to avoid the
    internal calls to the sys_getsockopt() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_getsockopt() allows us to avoid the
    internal calls to the sys_getsockopt() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_setsockopt() allows us to avoid the
    internal calls to the sys_setsockopt() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_shutdown() allows us to avoid the
    internal calls to the sys_shutdown() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_socketpair() allows us to avoid the
    internal calls to the sys_socketpair() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_getpeername() allows us to avoid the
    internal calls to the sys_getpeername() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_getsockname() allows us to avoid the
    internal calls to the sys_getsockname() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_listen() allows us to avoid the
    internal calls to the sys_listen() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_connect() allows us to avoid the
    internal calls to the sys_connect() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_bind() allows us to avoid the
    internal calls to the sys_bind() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_socket() allows us to avoid the
    internal calls to the sys_socket() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_accept4() allows us to avoid the
    internal calls to the sys_accept4() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_sendto() allows us to avoid the
    internal calls to the sys_sendto() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     
  • Using the net-internal helper __sys_recvfrom() allows us to avoid the
    internal calls to the sys_recvfrom() syscall.

    This patch is part of a series which removes in-kernel calls to syscalls.
    On this basis, the syscall entry path can be streamlined. For details, see
    http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Dominik Brodowski

    Dominik Brodowski
     

02 Apr, 2018

6 commits