05 Dec, 2020

1 commit

  • Currently, the sock_from_file prototype takes an "err" pointer that is
    either not set or set to -ENOTSOCK IFF the returned socket is NULL. This
    makes the error redundant and it is ignored by a few callers.

    This patch simplifies the API by letting callers deduce the error based
    on whether the returned socket is NULL or not.

    Suggested-by: Al Viro
    Signed-off-by: Florent Revest
    Signed-off-by: Daniel Borkmann
    Reviewed-by: KP Singh
    Link: https://lore.kernel.org/bpf/20201204113609.1850150-1-revest@google.com

    Florent Revest
     

04 Dec, 2020

1 commit

  • Alexei Starovoitov says:

    ====================
    pull-request: bpf-next 2020-12-03

    The main changes are:

    1) Support BTF in kernel modules, from Andrii.

    2) Introduce preferred busy-polling, from Björn.

    3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh.

    4) Memcg-based memory accounting for bpf objects, from Roman.

    5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav.

    * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits)
    selftests/bpf: Fix invalid use of strncat in test_sockmap
    libbpf: Use memcpy instead of strncpy to please GCC
    selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module
    selftests/bpf: Add tp_btf CO-RE reloc test for modules
    libbpf: Support attachment of BPF tracing programs to kernel modules
    libbpf: Factor out low-level BPF program loading helper
    bpf: Allow to specify kernel module BTFs when attaching BPF programs
    bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier
    selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF
    selftests/bpf: Add support for marking sub-tests as skipped
    selftests/bpf: Add bpf_testmod kernel module for testing
    libbpf: Add kernel module BTF support for CO-RE relocations
    libbpf: Refactor CO-RE relocs to not assume a single BTF object
    libbpf: Add internal helper to load BTF data by FD
    bpf: Keep module's btf_data_size intact after load
    bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()
    selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP
    bpf: Adds support for setting window clamp
    samples/bpf: Fix spelling mistake "recieving" -> "receiving"
    bpf: Fix cold build of test_progs-no_alu32
    ...
    ====================

    Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com
    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

01 Dec, 2020

3 commits

  • This allows invoking an additional callback under the
    socket spin lock.

    Will be used by the next patches to avoid additional
    spin lock contention.

    Acked-by: Florian Westphal
    Signed-off-by: Paolo Abeni
    Reviewed-by: Mat Martineau
    Signed-off-by: Jakub Kicinski

    Paolo Abeni
     
  • This option lets a user set a per socket NAPI budget for
    busy-polling. If the options is not set, it will use the default of 8.

    Signed-off-by: Björn Töpel
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Kicinski
    Link: https://lore.kernel.org/bpf/20201130185205.196029-3-bjorn.topel@gmail.com

    Björn Töpel
     
  • The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
    option or system-wide using the /proc/sys/net/core/busy_read knob, is
    an opportunistic. That means that if the NAPI context is not
    scheduled, it will poll it. If, after busy-polling, the budget is
    exceeded the busy-polling logic will schedule the NAPI onto the
    regular softirq handling.

    One implication of the behavior above is that a busy/heavy loaded NAPI
    context will never enter/allow for busy-polling. Some applications
    prefer that most NAPI processing would be done by busy-polling.

    This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
    in concert with the napi_defer_hard_irqs and gro_flush_timeout
    knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
    introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral
    feature"), and allows for a user to defer interrupts to be enabled and
    instead schedule the NAPI context from a watchdog timer. When a user
    enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
    and the NAPI context is being processed by a softirq, the softirq NAPI
    processing will exit early to allow the busy-polling to be performed.

    If the application stops performing busy-polling via a system call,
    the watchdog timer defined by gro_flush_timeout will timeout, and
    regular softirq handling will resume.

    In summary; Heavy traffic applications that prefer busy-polling over
    softirq processing should use this option.

    Example usage:

    $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
    $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout

    Note that the timeout should be larger than the userspace processing
    window, otherwise the watchdog will timeout and fall back to regular
    softirq processing.

    Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.

    Signed-off-by: Björn Töpel
    Signed-off-by: Daniel Borkmann
    Reviewed-by: Jakub Kicinski
    Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com

    Björn Töpel
     

21 Nov, 2020

1 commit

  • The static checker is fooled by the non-static locking scheme
    implemented by the mentioned helpers.
    Let's make its life easier adding some unconditional annotation
    so that the helpers are now interpreted as a plain spinlock from
    sparse.

    v1 -> v2:
    - add __releases() annotation to unlock_sock_fast()

    Signed-off-by: Paolo Abeni
    Link: https://lore.kernel.org/r/6ed7ae627d8271fb7f20e0a9c6750fbba1ac2635.1605634911.git.pabeni@redhat.com
    Signed-off-by: Jakub Kicinski

    Paolo Abeni
     

23 Oct, 2020

1 commit

  • In setsockopt(SO_MAX_PACING_RATE) on 64bit systems, sk_max_pacing_rate,
    after extended from 'u32' to 'unsigned long', takes unintentionally
    hiked value whenever assigned from an 'int' value with MSB=1, due to
    binary sign extension in promoting s32 to u64, e.g. 0x80000000 becomes
    0xFFFFFFFF80000000.

    Thus inflated sk_max_pacing_rate causes subsequent getsockopt to return
    ~0U unexpectedly. It may also result in increased pacing rate.

    Fix by explicitly casting the 'int' value to 'unsigned int' before
    assigning it to sk_max_pacing_rate, for zero extension to happen.

    Fixes: 76a9ebe811fb ("net: extend sk_pacing_rate to unsigned long")
    Signed-off-by: Ji Li
    Signed-off-by: Ke Li
    Reviewed-by: Eric Dumazet
    Link: https://lore.kernel.org/r/20201022064146.79873-1-keli@akamai.com
    Signed-off-by: Jakub Kicinski

    Ke Li
     

16 Oct, 2020

1 commit


14 Oct, 2020

2 commits

  • SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for
    hardware time stamps (configured via SO_TIMESTAMPING_NEW).

    User space (ptp4l) first configures hardware time stamping via
    SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l
    disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not
    switch hardware time stamps back to "32 bit mode".

    This problem happens on 32 bit platforms were the libc has already
    switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
    socket options). ptp4l complains with "missing timestamp on transmitted
    peer delay request" because the wrong format is received (and
    discarded).

    Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW")
    Fixes: 783da70e8396 ("net: add sock_enable_timestamps")
    Signed-off-by: Christian Eggers
    Acked-by: Willem de Bruijn
    Acked-by: Deepa Dinamani
    Signed-off-by: Jakub Kicinski

    Christian Eggers
     
  • The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around,
    so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally
    move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems
    unrelated.

    This problem happens on 32 bit platforms were the libc has already
    switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
    socket options). ptp4l complains with "missing timestamp on transmitted
    peer delay request" because the wrong format is received (and
    discarded).

    Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW")
    Signed-off-by: Christian Eggers
    Reviewed-by: Willem de Bruijn
    Reviewed-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Acked-by: Deepa Dinamani
    Signed-off-by: Jakub Kicinski

    Christian Eggers
     

25 Sep, 2020

1 commit

  • This patch added a new helper sk_stop_timer_sync, it deactivates a timer
    like sk_stop_timer, but waits for the handler to finish.

    Acked-by: Paolo Abeni
    Signed-off-by: Geliang Tang
    Reviewed-by: Mat Martineau
    Signed-off-by: David S. Miller

    Geliang Tang
     

05 Sep, 2020

1 commit

  • We got slightly different patches removing a double word
    in a comment in net/ipv4/raw.c - picked the version from net.

    Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
    values instead of VNIC login response buffer (following what
    commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login
    response buffer") did).

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

04 Sep, 2020

1 commit

  • Pull networking fixes from David Miller:

    1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

    2) Fix loss of RTT samples in rxrpc, from David Howells.

    3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

    4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

    5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

    6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

    7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

    8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

    9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

    10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

    11) Memory leak in rxkad_verify_response, from Dinghao Liu.

    12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

    13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

    14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

    15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
    net/smc: fix sock refcounting in case of termination
    net/smc: reset sndbuf_desc if freed
    net/smc: set rx_off for SMCR explicitly
    net/smc: fix toleration of fake add_link messages
    tg3: Fix soft lockup when tg3_reset_task() fails.
    doc: net: dsa: Fix typo in config code sample
    net: dp83867: Fix WoL SecureOn password
    nfp: flower: fix ABI mismatch between driver and firmware
    tipc: fix shutdown() of connectionless socket
    ipv6: Fix sysctl max for fib_multipath_hash_policy
    drivers/net/wan/hdlc: Change the default of hard_header_len to 0
    net: gemini: Fix another missing clk_disable_unprepare() in probe
    net: bcmgenet: fix mask check in bcmgenet_validate_flow()
    amd-xgbe: Add support for new port mode
    net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    vhost: fix typo in error message
    net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
    pktgen: fix error message with wrong function name
    net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
    cxgb4: fix thermal zone device registration
    ...

    Linus Torvalds
     

27 Aug, 2020

1 commit


24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

20 Aug, 2020

1 commit


14 Aug, 2020

1 commit

  • Pull networking fixes from David Miller:
    "Some merge window fallout, some longer term fixes:

    1) Handle headroom properly in lapbether and x25_asy drivers, from
    Xie He.

    2) Fetch MAC address from correct r8152 device node, from Thierry
    Reding.

    3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg,
    from Rouven Czerwinski.

    4) Correct fdputs in socket layer, from Miaohe Lin.

    5) Revert troublesome sockptr_t optimization, from Christoph Hellwig.

    6) Fix TCP TFO key reading on big endian, from Jason Baron.

    7) Missing CAP_NET_RAW check in nfc, from Qingyu Li.

    8) Fix inet fastreuse optimization with tproxy sockets, from Tim
    Froidcoeur.

    9) Fix 64-bit divide in new SFC driver, from Edward Cree.

    10) Add a tracepoint for prandom_u32 so that we can more easily
    perform usage analysis. From Eric Dumazet.

    11) Fix rwlock imbalance in AF_PACKET, from John Ogness"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
    net: openvswitch: introduce common code for flushing flows
    af_packet: TPACKET_V3: fix fill status rwlock imbalance
    random32: add a tracepoint for prandom_u32()
    Revert "ipv4: tunnel: fix compilation on ARCH=um"
    net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus
    net: ethernet: stmmac: Disable hardware multicast filter
    net: stmmac: dwmac1000: provide multicast filter fallback
    ipv4: tunnel: fix compilation on ARCH=um
    vsock: fix potential null pointer dereference in vsock_poll()
    sfc: fix ef100 design-param checking
    net: initialize fastreuse on inet_inherit_port
    net: refactor bind_bucket fastreuse into helper
    net: phy: marvell10g: fix null pointer dereference
    net: Fix potential memory leak in proto_register()
    net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
    ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()
    net/nfc/rawsock.c: add CAP_NET_RAW check.
    hinic: fix strncpy output truncated compile warnings
    drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check
    net/tls: Fix kmap usage
    ...

    Linus Torvalds
     

12 Aug, 2020

1 commit

  • If we failed to assign proto idx, we free the twsk_slab_name but forget to
    free the twsk_slab. Add a helper function tw_prot_cleanup() to free these
    together and also use this helper function in proto_unregister().

    Fixes: b45ce32135d1 ("sock: fix potential memory leak in proto_register()")
    Signed-off-by: Miaohe Lin
    Signed-off-by: David S. Miller

    Miaohe Lin
     

08 Aug, 2020

2 commits

  • Merge misc updates from Andrew Morton:

    - a few MM hotfixes

    - kthread, tools, scripts, ntfs and ocfs2

    - some of MM

    Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
    ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
    debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
    sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).

    * emailed patches from Andrew Morton : (162 commits)
    mm: vmscan: consistent update to pgrefill
    mm/vmscan.c: fix typo
    khugepaged: khugepaged_test_exit() check mmget_still_valid()
    khugepaged: retract_page_tables() remember to test exit
    khugepaged: collapse_pte_mapped_thp() protect the pmd lock
    khugepaged: collapse_pte_mapped_thp() flush the right range
    mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
    mm: thp: replace HTTP links with HTTPS ones
    mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
    mm/page_alloc.c: skip setting nodemask when we are in interrupt
    mm/page_alloc: fallbacks at most has 3 elements
    mm/page_alloc: silence a KASAN false positive
    mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
    mm/page_alloc.c: simplify pageblock bitmap access
    mm/page_alloc.c: extract the common part in pfn_to_bitidx()
    mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
    mm/shuffle: remove dynamic reconfiguration
    mm/memory_hotplug: document why shuffle_zone() is relevant
    mm/page_alloc: remove nr_free_pagecache_pages()
    mm: remove vm_total_pages
    ...

    Linus Torvalds
     
  • As said by Linus:

    A symmetric naming is only helpful if it implies symmetries in use.
    Otherwise it's actively misleading.

    In "kzalloc()", the z is meaningful and an important part of what the
    caller wants.

    In "kzfree()", the z is actively detrimental, because maybe in the
    future we really _might_ want to use that "memfill(0xdeadbeef)" or
    something. The "zero" part of the interface isn't even _relevant_.

    The main reason that kzfree() exists is to clear sensitive information
    that should not be leaked to other future users of the same memory
    objects.

    Rename kzfree() to kfree_sensitive() to follow the example of the recently
    added kvfree_sensitive() and make the intention of the API more explicit.
    In addition, memzero_explicit() is used to clear the memory to make sure
    that it won't get optimized away by the compiler.

    The renaming is done by using the command sequence:

    git grep -w --name-only kzfree |\
    xargs sed -i 's/kzfree/kfree_sensitive/'

    followed by some editing of the kfree_sensitive() kerneldoc and adding
    a kzfree backward compatibility macro in slab.h.

    [akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
    [akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]

    Suggested-by: Joe Perches
    Signed-off-by: Waiman Long
    Signed-off-by: Andrew Morton
    Acked-by: David Howells
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: Jarkko Sakkinen
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Joe Perches
    Cc: Matthew Wilcox
    Cc: David Rientjes
    Cc: Dan Carpenter
    Cc: "Jason A . Donenfeld"
    Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
    Signed-off-by: Linus Torvalds

    Waiman Long
     

07 Aug, 2020

1 commit

  • Pull dlm updates from David Teigland:
    "This set includes a some improvements to the dlm networking layer:
    improving the ability to trace dlm messages for debugging, and
    improved handling of bad messages or disrupted connections"

    * tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    fs: dlm: implement tcp graceful shutdown
    fs: dlm: change handling of reconnects
    fs: dlm: don't close socket on invalid message
    fs: dlm: set skb mark per peer socket
    fs: dlm: set skb mark for listen socket
    net: sock: add sock_set_mark
    dlm: Fix kobject memleak

    Linus Torvalds
     

06 Aug, 2020

2 commits

  • This patch adds a new socket helper function to set the mark value for a
    kernel socket.

    Signed-off-by: Alexander Aring
    Signed-off-by: David Teigland

    Alexander Aring
     
  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     

05 Aug, 2020

1 commit

  • Pull seccomp updates from Kees Cook:
    "There are a bunch of clean ups and selftest improvements along with
    two major updates to the SECCOMP_RET_USER_NOTIF filter return:
    EPOLLHUP support to more easily detect the death of a monitored
    process, and being able to inject fds when intercepting syscalls that
    expect an fd-opening side-effect (needed by both container folks and
    Chrome). The latter continued the refactoring of __scm_install_fd()
    started by Christoph, and in the process found and fixed a handful of
    bugs in various callers.

    - Improved selftest coverage, timeouts, and reporting

    - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)

    - Refactor __scm_install_fd() into __receive_fd() and fix buggy
    callers

    - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
    Dhillon)"

    * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
    selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
    seccomp: Introduce addfd ioctl to seccomp user notifier
    fs: Expand __receive_fd() to accept existing fd
    pidfd: Replace open-coded receive_fd()
    fs: Add receive_fd() wrapper for __receive_fd()
    fs: Move __scm_install_fd() to __receive_fd()
    net/scm: Regularize compat handling of scm_detach_fds()
    pidfd: Add missing sock updates for pidfd_getfd()
    net/compat: Add missing sock updates for SCM_RIGHTS
    selftests/seccomp: Check ENOSYS under tracing
    selftests/seccomp: Refactor to use fixture variants
    selftests/harness: Clean up kern-doc for fixtures
    seccomp: Use -1 marker for end of mode 1 syscall list
    seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
    selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
    selftests/seccomp: Make kcmp() less required
    seccomp: Use pr_fmt
    selftests/seccomp: Improve calibration loop
    selftests/seccomp: use 90s as timeout
    selftests/seccomp: Expand benchmark to per-filter measurements
    ...

    Linus Torvalds
     

31 Jul, 2020

1 commit


25 Jul, 2020

5 commits


23 Jul, 2020

1 commit


20 Jul, 2020

4 commits


14 Jul, 2020

1 commit

  • Add missed sock updates to compat path via a new helper, which will be
    used more in coming patches. (The net/core/scm.c code is left as-is here
    to assist with -stable backports for the compat path.)

    Cc: Christoph Hellwig
    Cc: Sargun Dhillon
    Cc: Jakub Kicinski
    Cc: stable@vger.kernel.org
    Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
    Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
    Acked-by: Christian Brauner
    Signed-off-by: Kees Cook

    Kees Cook
     

11 Jul, 2020

1 commit


10 Jul, 2020

1 commit

  • After commit bf9765145b85 ("sock: Make sk_protocol a 16-bit value")
    the current size of 'sdiag_protocol' is not sufficient to represent
    the possible protocol values.

    This change introduces a new inet diag request attribute to let
    user space specify the relevant protocol number using u32 values.

    The attribute is parsed by inet diag core on get/dump command
    and the extended protocol value, if available, is preferred to
    'sdiag_protocol' to lookup the diag handler.

    The parse attributed are exposed to all the diag handlers via
    the cb->data.

    Note that inet_diag_dump_one_icsk() is left unmodified, as it
    will not be used by protocol using the extended attribute.

    Suggested-by: David S. Miller
    Co-developed-by: Christoph Paasch
    Signed-off-by: Christoph Paasch
    Acked-by: Mat Martineau
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

08 Jul, 2020

1 commit

  • When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
    copied, so the cgroup refcnt must be taken too. And, unlike the
    sk_alloc() path, sock_update_netprioidx() is not called here.
    Therefore, it is safe and necessary to grab the cgroup refcnt
    even when cgroup_sk_alloc is disabled.

    sk_clone_lock() is in BH context anyway, the in_interrupt()
    would terminate this function if called there. And for sk_alloc()
    skcd->val is always zero. So it's safe to factor out the code
    to make it more readable.

    The global variable 'cgroup_sk_alloc_disabled' is used to determine
    whether to take these reference counts. It is impossible to make
    the reference counting correct unless we save this bit of information
    in skcd->val. So, add a new bit there to record whether the socket
    has already taken the reference counts. This obviously relies on
    kmalloc() to align cgroup pointers to at least 4 bytes,
    ARCH_KMALLOC_MINALIGN is certainly larger than that.

    This bug seems to be introduced since the beginning, commit
    d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
    tried to fix it but not compeletely. It seems not easy to trigger until
    the recent commit 090e28b229af
    ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

    Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
    Reported-by: Cameron Berkenpas
    Reported-by: Peter Geis
    Reported-by: Lu Fengqi
    Reported-by: Daniël Sonck
    Reported-by: Zhang Qiang
    Tested-by: Cameron Berkenpas
    Tested-by: Peter Geis
    Tested-by: Thomas Lamprecht
    Cc: Daniel Borkmann
    Cc: Zefan Li
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

05 Jul, 2020

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2020-07-04

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 73 non-merge commits during the last 17 day(s) which contain
    a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).

    The main changes are:

    1) bpftool ability to show PIDs of processes having open file descriptors
    for BPF map/program/link/BTF objects, relying on BPF iterator progs
    to extract this info efficiently, from Andrii Nakryiko.

    2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
    seq_files, from Yonghong Song.

    3) Support access to BPF map fields in struct bpf_map from programs
    through BTF struct access, from Andrey Ignatov.

    4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
    via seq_file from BPF iterator progs, from Song Liu.

    5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
    helper, from Dmitry Yakunin.

    6) Optimize BPF sk_storage selection of its caching index, from Martin
    KaFai Lau.

    7) Removal of redundant synchronize_rcu()s from BPF map destruction which
    has been a historic leftover, from Alexei Starovoitov.

    8) Several improvements to test_progs to make it easier to create a shell
    loop that invokes each test individually which is useful for some CIs,
    from Jesper Dangaard Brouer.

    9) Fix bpftool prog dump segfault when compiled without skeleton code on
    older clang versions, from John Fastabend.

    10) Bunch of cleanups and minor improvements, from various others.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller