23 Oct, 2020

1 commit

  • In setsockopt(SO_MAX_PACING_RATE) on 64bit systems, sk_max_pacing_rate,
    after extended from 'u32' to 'unsigned long', takes unintentionally
    hiked value whenever assigned from an 'int' value with MSB=1, due to
    binary sign extension in promoting s32 to u64, e.g. 0x80000000 becomes
    0xFFFFFFFF80000000.

    Thus inflated sk_max_pacing_rate causes subsequent getsockopt to return
    ~0U unexpectedly. It may also result in increased pacing rate.

    Fix by explicitly casting the 'int' value to 'unsigned int' before
    assigning it to sk_max_pacing_rate, for zero extension to happen.

    Fixes: 76a9ebe811fb ("net: extend sk_pacing_rate to unsigned long")
    Signed-off-by: Ji Li
    Signed-off-by: Ke Li
    Reviewed-by: Eric Dumazet
    Link: https://lore.kernel.org/r/20201022064146.79873-1-keli@akamai.com
    Signed-off-by: Jakub Kicinski

    Ke Li
     

16 Oct, 2020

1 commit


14 Oct, 2020

2 commits

  • SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for
    hardware time stamps (configured via SO_TIMESTAMPING_NEW).

    User space (ptp4l) first configures hardware time stamping via
    SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l
    disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not
    switch hardware time stamps back to "32 bit mode".

    This problem happens on 32 bit platforms were the libc has already
    switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
    socket options). ptp4l complains with "missing timestamp on transmitted
    peer delay request" because the wrong format is received (and
    discarded).

    Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW")
    Fixes: 783da70e8396 ("net: add sock_enable_timestamps")
    Signed-off-by: Christian Eggers
    Acked-by: Willem de Bruijn
    Acked-by: Deepa Dinamani
    Signed-off-by: Jakub Kicinski

    Christian Eggers
     
  • The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around,
    so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally
    move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems
    unrelated.

    This problem happens on 32 bit platforms were the libc has already
    switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
    socket options). ptp4l complains with "missing timestamp on transmitted
    peer delay request" because the wrong format is received (and
    discarded).

    Fixes: 9718475e6908 ("socket: Add SO_TIMESTAMPING_NEW")
    Signed-off-by: Christian Eggers
    Reviewed-by: Willem de Bruijn
    Reviewed-by: Deepa Dinamani
    Acked-by: Willem de Bruijn
    Acked-by: Deepa Dinamani
    Signed-off-by: Jakub Kicinski

    Christian Eggers
     

25 Sep, 2020

1 commit

  • This patch added a new helper sk_stop_timer_sync, it deactivates a timer
    like sk_stop_timer, but waits for the handler to finish.

    Acked-by: Paolo Abeni
    Signed-off-by: Geliang Tang
    Reviewed-by: Mat Martineau
    Signed-off-by: David S. Miller

    Geliang Tang
     

05 Sep, 2020

1 commit

  • We got slightly different patches removing a double word
    in a comment in net/ipv4/raw.c - picked the version from net.

    Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
    values instead of VNIC login response buffer (following what
    commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login
    response buffer") did).

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

04 Sep, 2020

1 commit

  • Pull networking fixes from David Miller:

    1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
    Kivilinna.

    2) Fix loss of RTT samples in rxrpc, from David Howells.

    3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.

    4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.

    5) We disable BH for too lokng in sctp_get_port_local(), add a
    cond_resched() here as well, from Xin Long.

    6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.

    7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
    Yonghong Song.

    8) Missing of_node_put() in mt7530 DSA driver, from Sumera
    Priyadarsini.

    9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.

    10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.

    11) Memory leak in rxkad_verify_response, from Dinghao Liu.

    12) In tipc, don't use smp_processor_id() in preemptible context. From
    Tuong Lien.

    13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.

    14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.

    15) Fix ABI mismatch between driver and firmware in nfp, from Louis
    Peens.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
    net/smc: fix sock refcounting in case of termination
    net/smc: reset sndbuf_desc if freed
    net/smc: set rx_off for SMCR explicitly
    net/smc: fix toleration of fake add_link messages
    tg3: Fix soft lockup when tg3_reset_task() fails.
    doc: net: dsa: Fix typo in config code sample
    net: dp83867: Fix WoL SecureOn password
    nfp: flower: fix ABI mismatch between driver and firmware
    tipc: fix shutdown() of connectionless socket
    ipv6: Fix sysctl max for fib_multipath_hash_policy
    drivers/net/wan/hdlc: Change the default of hard_header_len to 0
    net: gemini: Fix another missing clk_disable_unprepare() in probe
    net: bcmgenet: fix mask check in bcmgenet_validate_flow()
    amd-xgbe: Add support for new port mode
    net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    vhost: fix typo in error message
    net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
    pktgen: fix error message with wrong function name
    net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
    cxgb4: fix thermal zone device registration
    ...

    Linus Torvalds
     

27 Aug, 2020

1 commit


24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

20 Aug, 2020

1 commit


14 Aug, 2020

1 commit

  • Pull networking fixes from David Miller:
    "Some merge window fallout, some longer term fixes:

    1) Handle headroom properly in lapbether and x25_asy drivers, from
    Xie He.

    2) Fetch MAC address from correct r8152 device node, from Thierry
    Reding.

    3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg,
    from Rouven Czerwinski.

    4) Correct fdputs in socket layer, from Miaohe Lin.

    5) Revert troublesome sockptr_t optimization, from Christoph Hellwig.

    6) Fix TCP TFO key reading on big endian, from Jason Baron.

    7) Missing CAP_NET_RAW check in nfc, from Qingyu Li.

    8) Fix inet fastreuse optimization with tproxy sockets, from Tim
    Froidcoeur.

    9) Fix 64-bit divide in new SFC driver, from Edward Cree.

    10) Add a tracepoint for prandom_u32 so that we can more easily
    perform usage analysis. From Eric Dumazet.

    11) Fix rwlock imbalance in AF_PACKET, from John Ogness"

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
    net: openvswitch: introduce common code for flushing flows
    af_packet: TPACKET_V3: fix fill status rwlock imbalance
    random32: add a tracepoint for prandom_u32()
    Revert "ipv4: tunnel: fix compilation on ARCH=um"
    net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus
    net: ethernet: stmmac: Disable hardware multicast filter
    net: stmmac: dwmac1000: provide multicast filter fallback
    ipv4: tunnel: fix compilation on ARCH=um
    vsock: fix potential null pointer dereference in vsock_poll()
    sfc: fix ef100 design-param checking
    net: initialize fastreuse on inet_inherit_port
    net: refactor bind_bucket fastreuse into helper
    net: phy: marvell10g: fix null pointer dereference
    net: Fix potential memory leak in proto_register()
    net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
    ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()
    net/nfc/rawsock.c: add CAP_NET_RAW check.
    hinic: fix strncpy output truncated compile warnings
    drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check
    net/tls: Fix kmap usage
    ...

    Linus Torvalds
     

12 Aug, 2020

1 commit

  • If we failed to assign proto idx, we free the twsk_slab_name but forget to
    free the twsk_slab. Add a helper function tw_prot_cleanup() to free these
    together and also use this helper function in proto_unregister().

    Fixes: b45ce32135d1 ("sock: fix potential memory leak in proto_register()")
    Signed-off-by: Miaohe Lin
    Signed-off-by: David S. Miller

    Miaohe Lin
     

08 Aug, 2020

2 commits

  • Merge misc updates from Andrew Morton:

    - a few MM hotfixes

    - kthread, tools, scripts, ntfs and ocfs2

    - some of MM

    Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
    ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
    debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
    sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).

    * emailed patches from Andrew Morton : (162 commits)
    mm: vmscan: consistent update to pgrefill
    mm/vmscan.c: fix typo
    khugepaged: khugepaged_test_exit() check mmget_still_valid()
    khugepaged: retract_page_tables() remember to test exit
    khugepaged: collapse_pte_mapped_thp() protect the pmd lock
    khugepaged: collapse_pte_mapped_thp() flush the right range
    mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
    mm: thp: replace HTTP links with HTTPS ones
    mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
    mm/page_alloc.c: skip setting nodemask when we are in interrupt
    mm/page_alloc: fallbacks at most has 3 elements
    mm/page_alloc: silence a KASAN false positive
    mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
    mm/page_alloc.c: simplify pageblock bitmap access
    mm/page_alloc.c: extract the common part in pfn_to_bitidx()
    mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
    mm/shuffle: remove dynamic reconfiguration
    mm/memory_hotplug: document why shuffle_zone() is relevant
    mm/page_alloc: remove nr_free_pagecache_pages()
    mm: remove vm_total_pages
    ...

    Linus Torvalds
     
  • As said by Linus:

    A symmetric naming is only helpful if it implies symmetries in use.
    Otherwise it's actively misleading.

    In "kzalloc()", the z is meaningful and an important part of what the
    caller wants.

    In "kzfree()", the z is actively detrimental, because maybe in the
    future we really _might_ want to use that "memfill(0xdeadbeef)" or
    something. The "zero" part of the interface isn't even _relevant_.

    The main reason that kzfree() exists is to clear sensitive information
    that should not be leaked to other future users of the same memory
    objects.

    Rename kzfree() to kfree_sensitive() to follow the example of the recently
    added kvfree_sensitive() and make the intention of the API more explicit.
    In addition, memzero_explicit() is used to clear the memory to make sure
    that it won't get optimized away by the compiler.

    The renaming is done by using the command sequence:

    git grep -w --name-only kzfree |\
    xargs sed -i 's/kzfree/kfree_sensitive/'

    followed by some editing of the kfree_sensitive() kerneldoc and adding
    a kzfree backward compatibility macro in slab.h.

    [akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
    [akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]

    Suggested-by: Joe Perches
    Signed-off-by: Waiman Long
    Signed-off-by: Andrew Morton
    Acked-by: David Howells
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Cc: Jarkko Sakkinen
    Cc: James Morris
    Cc: "Serge E. Hallyn"
    Cc: Joe Perches
    Cc: Matthew Wilcox
    Cc: David Rientjes
    Cc: Dan Carpenter
    Cc: "Jason A . Donenfeld"
    Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
    Signed-off-by: Linus Torvalds

    Waiman Long
     

07 Aug, 2020

1 commit

  • Pull dlm updates from David Teigland:
    "This set includes a some improvements to the dlm networking layer:
    improving the ability to trace dlm messages for debugging, and
    improved handling of bad messages or disrupted connections"

    * tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
    fs: dlm: implement tcp graceful shutdown
    fs: dlm: change handling of reconnects
    fs: dlm: don't close socket on invalid message
    fs: dlm: set skb mark per peer socket
    fs: dlm: set skb mark for listen socket
    net: sock: add sock_set_mark
    dlm: Fix kobject memleak

    Linus Torvalds
     

06 Aug, 2020

2 commits

  • This patch adds a new socket helper function to set the mark value for a
    kernel socket.

    Signed-off-by: Alexander Aring
    Signed-off-by: David Teigland

    Alexander Aring
     
  • Pull networking updates from David Miller:

    1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

    2) Support UDP segmentation in code TSO code, from Eric Dumazet.

    3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

    4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

    5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

    6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

    7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

    8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

    9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

    10) Switch qca8k driver over to phylink, from Jonathan McDowell.

    11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

    12) Several conversions over to generic power management, from Vaibhav
    Gupta.

    13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

    14) Various https url conversions, from Alexander A. Klimov.

    15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

    16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

    17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

    18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

    19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

    20) XDP support for xen-netfront, from Denis Kirjanov.

    21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

    22) Support EF100 chip in sfc driver, from Edward Cree.

    23) Add XDP support to mvpp2 driver, from Matteo Croce.

    24) Support MPTCP in sock_diag, from Paolo Abeni.

    25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

    26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

    27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

    28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

    29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

    30) Add rfc4884 support to icmp code, from Willem de Bruijn.

    31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

    32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

    33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

    34) Support TCP syncookies in MPTCP, from Flowian Westphal.

    35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
    net: thunderx: initialize VF's mailbox mutex before first usage
    usb: hso: remove bogus check for EINPROGRESS
    usb: hso: no complaint about kmalloc failure
    hso: fix bailout in error case of probe
    ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
    selftests/net: relax cpu affinity requirement in msg_zerocopy test
    mptcp: be careful on subflow creation
    selftests: rtnetlink: make kci_test_encap() return sub-test result
    selftests: rtnetlink: correct the final return value for the test
    net: dsa: sja1105: use detected device id instead of DT one on mismatch
    tipc: set ub->ifindex for local ipv6 address
    ipv6: add ipv6_dev_find()
    net: openvswitch: silence suspicious RCU usage warning
    Revert "vxlan: fix tos value before xmit"
    ptp: only allow phase values lower than 1 period
    farsync: switch from 'pci_' to 'dma_' API
    wan: wanxl: switch from 'pci_' to 'dma_' API
    hv_netvsc: do not use VF device if link is down
    dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
    net: macb: Properly handle phylink on at91sam9x
    ...

    Linus Torvalds
     

05 Aug, 2020

1 commit

  • Pull seccomp updates from Kees Cook:
    "There are a bunch of clean ups and selftest improvements along with
    two major updates to the SECCOMP_RET_USER_NOTIF filter return:
    EPOLLHUP support to more easily detect the death of a monitored
    process, and being able to inject fds when intercepting syscalls that
    expect an fd-opening side-effect (needed by both container folks and
    Chrome). The latter continued the refactoring of __scm_install_fd()
    started by Christoph, and in the process found and fixed a handful of
    bugs in various callers.

    - Improved selftest coverage, timeouts, and reporting

    - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)

    - Refactor __scm_install_fd() into __receive_fd() and fix buggy
    callers

    - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
    Dhillon)"

    * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
    selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
    seccomp: Introduce addfd ioctl to seccomp user notifier
    fs: Expand __receive_fd() to accept existing fd
    pidfd: Replace open-coded receive_fd()
    fs: Add receive_fd() wrapper for __receive_fd()
    fs: Move __scm_install_fd() to __receive_fd()
    net/scm: Regularize compat handling of scm_detach_fds()
    pidfd: Add missing sock updates for pidfd_getfd()
    net/compat: Add missing sock updates for SCM_RIGHTS
    selftests/seccomp: Check ENOSYS under tracing
    selftests/seccomp: Refactor to use fixture variants
    selftests/harness: Clean up kern-doc for fixtures
    seccomp: Use -1 marker for end of mode 1 syscall list
    seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
    selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
    selftests/seccomp: Make kcmp() less required
    seccomp: Use pr_fmt
    selftests/seccomp: Improve calibration loop
    selftests/seccomp: use 90s as timeout
    selftests/seccomp: Expand benchmark to per-filter measurements
    ...

    Linus Torvalds
     

31 Jul, 2020

1 commit


25 Jul, 2020

5 commits


23 Jul, 2020

1 commit


20 Jul, 2020

4 commits


14 Jul, 2020

1 commit

  • Add missed sock updates to compat path via a new helper, which will be
    used more in coming patches. (The net/core/scm.c code is left as-is here
    to assist with -stable backports for the compat path.)

    Cc: Christoph Hellwig
    Cc: Sargun Dhillon
    Cc: Jakub Kicinski
    Cc: stable@vger.kernel.org
    Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
    Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
    Acked-by: Christian Brauner
    Signed-off-by: Kees Cook

    Kees Cook
     

11 Jul, 2020

1 commit


10 Jul, 2020

1 commit

  • After commit bf9765145b85 ("sock: Make sk_protocol a 16-bit value")
    the current size of 'sdiag_protocol' is not sufficient to represent
    the possible protocol values.

    This change introduces a new inet diag request attribute to let
    user space specify the relevant protocol number using u32 values.

    The attribute is parsed by inet diag core on get/dump command
    and the extended protocol value, if available, is preferred to
    'sdiag_protocol' to lookup the diag handler.

    The parse attributed are exposed to all the diag handlers via
    the cb->data.

    Note that inet_diag_dump_one_icsk() is left unmodified, as it
    will not be used by protocol using the extended attribute.

    Suggested-by: David S. Miller
    Co-developed-by: Christoph Paasch
    Signed-off-by: Christoph Paasch
    Acked-by: Mat Martineau
    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

08 Jul, 2020

1 commit

  • When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
    copied, so the cgroup refcnt must be taken too. And, unlike the
    sk_alloc() path, sock_update_netprioidx() is not called here.
    Therefore, it is safe and necessary to grab the cgroup refcnt
    even when cgroup_sk_alloc is disabled.

    sk_clone_lock() is in BH context anyway, the in_interrupt()
    would terminate this function if called there. And for sk_alloc()
    skcd->val is always zero. So it's safe to factor out the code
    to make it more readable.

    The global variable 'cgroup_sk_alloc_disabled' is used to determine
    whether to take these reference counts. It is impossible to make
    the reference counting correct unless we save this bit of information
    in skcd->val. So, add a new bit there to record whether the socket
    has already taken the reference counts. This obviously relies on
    kmalloc() to align cgroup pointers to at least 4 bytes,
    ARCH_KMALLOC_MINALIGN is certainly larger than that.

    This bug seems to be introduced since the beginning, commit
    d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
    tried to fix it but not compeletely. It seems not easy to trigger until
    the recent commit 090e28b229af
    ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

    Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
    Reported-by: Cameron Berkenpas
    Reported-by: Peter Geis
    Reported-by: Lu Fengqi
    Reported-by: Daniël Sonck
    Reported-by: Zhang Qiang
    Tested-by: Cameron Berkenpas
    Tested-by: Peter Geis
    Tested-by: Thomas Lamprecht
    Cc: Daniel Borkmann
    Cc: Zefan Li
    Cc: Tejun Heo
    Cc: Roman Gushchin
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     

05 Jul, 2020

1 commit

  • Daniel Borkmann says:

    ====================
    pull-request: bpf-next 2020-07-04

    The following pull-request contains BPF updates for your *net-next* tree.

    We've added 73 non-merge commits during the last 17 day(s) which contain
    a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).

    The main changes are:

    1) bpftool ability to show PIDs of processes having open file descriptors
    for BPF map/program/link/BTF objects, relying on BPF iterator progs
    to extract this info efficiently, from Andrii Nakryiko.

    2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
    seq_files, from Yonghong Song.

    3) Support access to BPF map fields in struct bpf_map from programs
    through BTF struct access, from Andrey Ignatov.

    4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
    via seq_file from BPF iterator progs, from Song Liu.

    5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
    helper, from Dmitry Yakunin.

    6) Optimize BPF sk_storage selection of its caching index, from Martin
    KaFai Lau.

    7) Removal of redundant synchronize_rcu()s from BPF map destruction which
    has been a historic leftover, from Alexei Starovoitov.

    8) Several improvements to test_progs to make it easier to create a shell
    loop that invokes each test individually which is useful for some CIs,
    from Jesper Dangaard Brouer.

    9) Fix bpftool prog dump segfault when compiled without skeleton code on
    older clang versions, from John Fastabend.

    10) Bunch of cleanups and minor improvements, from various others.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Jun, 2020

1 commit


25 Jun, 2020

1 commit


24 Jun, 2020

1 commit

  • Clearing the sock TX queue in sk_set_socket() might cause unexpected
    out-of-order transmit when called from sock_orphan(), as outstanding
    packets can pick a different TX queue and bypass the ones already queued.

    This is undesired in general. More specifically, it breaks the in-order
    scheduling property guarantee for device-offloaded TLS sockets.

    Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
    explicitly only where needed.

    Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
    Signed-off-by: Tariq Toukan
    Reviewed-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Tariq Toukan
     

19 Jun, 2020

1 commit

  • Back in commit f60e5990d9c1 ("ipv6: protect skb->sk accesses
    from recursive dereference inside the stack") Hannes added code
    so that IPv6 stack would not trust skb->sk for typical cases
    where packet goes through 'standard' xmit path (__dev_queue_xmit())

    Alas af_packet had a dev_direct_xmit() path that was not
    dealing yet with xmit_recursion level.

    Also change sk_mc_loop() to dump a stack once only.

    Without this patch, syzbot was able to trigger :

    [1]
    [ 153.567378] WARNING: CPU: 7 PID: 11273 at net/core/sock.c:721 sk_mc_loop+0x51/0x70
    [ 153.567378] Modules linked in: nfnetlink ip6table_raw ip6table_filter iptable_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_filter macsec macvtap tap macvlan 8021q hsr wireguard libblake2s blake2s_x86_64 libblake2s_generic udp_tunnel ip6_udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 libchacha curve25519_x86_64 libcurve25519_generic netdevsim batman_adv dummy team bridge stp llc w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core
    [ 153.567386] CPU: 7 PID: 11273 Comm: b159172088 Not tainted 5.8.0-smp-DEV #273
    [ 153.567387] RIP: 0010:sk_mc_loop+0x51/0x70
    [ 153.567388] Code: 66 83 f8 0a 75 24 0f b6 4f 12 b8 01 00 00 00 31 d2 d3 e0 a9 bf ef ff ff 74 07 48 8b 97 f0 02 00 00 0f b6 42 3a 83 e0 01 5d c3 0b b8 01 00 00 00 5d c3 0f b6 87 18 03 00 00 5d c0 e8 04 83 e0
    [ 153.567388] RSP: 0018:ffff95c69bb93990 EFLAGS: 00010212
    [ 153.567388] RAX: 0000000000000011 RBX: ffff95c6e0ee3e00 RCX: 0000000000000007
    [ 153.567389] RDX: ffff95c69ae50000 RSI: ffff95c6c30c3000 RDI: ffff95c6c30c3000
    [ 153.567389] RBP: ffff95c69bb93990 R08: ffff95c69a77f000 R09: 0000000000000008
    [ 153.567389] R10: 0000000000000040 R11: 00003e0e00026128 R12: ffff95c6c30c3000
    [ 153.567390] R13: ffff95c6cc4fd500 R14: ffff95c6f84500c0 R15: ffff95c69aa13c00
    [ 153.567390] FS: 00007fdc3a283700(0000) GS:ffff95c6ff9c0000(0000) knlGS:0000000000000000
    [ 153.567390] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 153.567391] CR2: 00007ffee758e890 CR3: 0000001f9ba20003 CR4: 00000000001606e0
    [ 153.567391] Call Trace:
    [ 153.567391] ip6_finish_output2+0x34e/0x550
    [ 153.567391] __ip6_finish_output+0xe7/0x110
    [ 153.567391] ip6_finish_output+0x2d/0xb0
    [ 153.567392] ip6_output+0x77/0x120
    [ 153.567392] ? __ip6_finish_output+0x110/0x110
    [ 153.567392] ip6_local_out+0x3d/0x50
    [ 153.567392] ipvlan_queue_xmit+0x56c/0x5e0
    [ 153.567393] ? ksize+0x19/0x30
    [ 153.567393] ipvlan_start_xmit+0x18/0x50
    [ 153.567393] dev_direct_xmit+0xf3/0x1c0
    [ 153.567393] packet_direct_xmit+0x69/0xa0
    [ 153.567394] packet_sendmsg+0xbf0/0x19b0
    [ 153.567394] ? plist_del+0x62/0xb0
    [ 153.567394] sock_sendmsg+0x65/0x70
    [ 153.567394] sock_write_iter+0x93/0xf0
    [ 153.567394] new_sync_write+0x18e/0x1a0
    [ 153.567395] __vfs_write+0x29/0x40
    [ 153.567395] vfs_write+0xb9/0x1b0
    [ 153.567395] ksys_write+0xb1/0xe0
    [ 153.567395] __x64_sys_write+0x1a/0x20
    [ 153.567395] do_syscall_64+0x43/0x70
    [ 153.567396] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 153.567396] RIP: 0033:0x453549
    [ 153.567396] Code: Bad RIP value.
    [ 153.567396] RSP: 002b:00007fdc3a282cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    [ 153.567397] RAX: ffffffffffffffda RBX: 00000000004d32d0 RCX: 0000000000453549
    [ 153.567397] RDX: 0000000000000020 RSI: 0000000020000300 RDI: 0000000000000003
    [ 153.567398] RBP: 00000000004d32d8 R08: 0000000000000000 R09: 0000000000000000
    [ 153.567398] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d32dc
    [ 153.567398] R13: 00007ffee742260f R14: 00007fdc3a282dc0 R15: 00007fdc3a283700
    [ 153.567399] ---[ end trace c1d5ae2b1059ec62 ]---

    f60e5990d9c1 ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Jun, 2020

1 commit

  • The sock_bindtoindex intended for kernel wide usage however
    it will lock the socket regardless of the context. This modification
    relax this behavior optionally: locking the socket will be optional
    by calling the sock_bindtoindex with lock_sk = true.

    The modification applied to all users of the sock_bindtoindex.

    Signed-off-by: Ferenc Fejes
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/bee6355da40d9e991b2f2d12b67d55ebb5f5b207.1590871065.git.fejes@inf.elte.hu

    Ferenc Fejes
     

30 May, 2020

1 commit

  • The SCTP protocol allows to bind multiple address to a socket. That
    feature is currently only exposed as a socket option. Add a bind_add
    method struct proto that allows to bind additional addresses, and
    switch the dlm code to use the method instead of going through the
    socket option from kernel space.

    Signed-off-by: Christoph Hellwig
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Christoph Hellwig