07 Feb, 2020

1 commit

  • The caller of XDP_SETUP_PROG has already incremented refcnt in
    __bpf_prog_get(), so drivers should only increment refcnt by
    num_queues - 1.

    To fix the issue, update netvsc_xdp_set() to add the correct number
    to refcnt.

    Hold a refcnt in netvsc_xdp_set()’s other caller, netvsc_attach().

    And, do the same in netvsc_vf_setxdp(). Otherwise, every time when VF is
    removed and added from the host side, the refcnt will be decreased by one,
    which may cause page fault when unloading xdp program.

    Fixes: 351e1581395f ("hv_netvsc: Add XDP support")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

25 Jan, 2020

1 commit

  • This patch adds support of XDP in native mode for hv_netvsc driver, and
    transparently sets the XDP program on the associated VF NIC as well.

    Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to
    VF NIC automatically. Setting / unsetting XDP program on VF NIC directly
    is not recommended, also not propagated to synthetic NIC, and may be
    overwritten by setting of synthetic NIC.

    The Azure/Hyper-V synthetic NIC receive buffer doesn't provide headroom
    for XDP. We thought about re-use the RNDIS header space, but it's too
    small. So we decided to copy the packets to a page buffer for XDP. And,
    most of our VMs on Azure have Accelerated Network (SRIOV) enabled, so
    most of the packets run on VF NIC. The synthetic NIC is considered as a
    fallback data-path. So the data copy on netvsc won't impact performance
    significantly.

    XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
    before running XDP:
    ethtool -K eth0 lro off

    XDP actions not yet supported:
    XDP_REDIRECT

    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

16 Jan, 2020

1 commit

  • kmemleak detects the following memory leak when hot removing
    a network device:

    unreferenced object 0xffff888083f63600 (size 256):
    comm "kworker/0:1", pid 12, jiffies 4294831717 (age 1113.676s)
    hex dump (first 32 bytes):
    00 40 c7 33 80 88 ff ff 00 00 00 00 10 00 00 00 .@.3............
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
    backtrace:
    [] rndis_filter_device_add+0x117/0x11c0 [hv_netvsc]
    [] netvsc_probe+0x5e7/0xbf0 [hv_netvsc]
    [] vmbus_probe+0x74/0x170 [hv_vmbus]
    [] really_probe+0x22f/0xb50
    [] driver_probe_device+0x25e/0x370
    [] bus_for_each_drv+0x11f/0x1b0
    [] __device_attach+0x1c6/0x2f0
    [] bus_probe_device+0x1a6/0x260
    [] device_add+0x10a3/0x18e0
    [] vmbus_device_register+0xe7/0x1e0 [hv_vmbus]
    [] vmbus_add_channel_work+0x8ab/0x1770 [hv_vmbus]
    [] process_one_work+0x919/0x17d0
    [] worker_thread+0x87/0xb40
    [] kthread+0x333/0x3f0
    [] ret_from_fork+0x3a/0x50

    rndis_filter_device_add() allocates an instance of struct rndis_device
    which never gets deallocated as rndis_filter_device_remove() sets
    net_device->extension which points to the rndis_device struct to NULL,
    leaving the rndis_device dangling.

    Since net_device->extension is eventually freed in free_netvsc_device(),
    we refrain from setting it to NULL inside rndis_filter_device_remove()

    Signed-off-by: Mohammed Gamal
    Reviewed-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Mohammed Gamal
     

23 Dec, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
    including adding a missing ipv6 match description.

    2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
    Bhat.

    3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

    4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

    5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

    6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
    Chaignon.

    7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

    8) Fix established socket lookup race when socket goes from
    TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
    RCU grace period. From Eric Dumazet.

    9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

    10) Fix active backup transition after link failure in bonding, from
    Mahesh Bandewar.

    11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

    12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

    13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

    14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

    15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

    16) Reject invalid MTU values in stmmac, from Jose Abreu.

    17) Fix refcount leak in error path of u32 classifier, from Davide
    Caratti.

    18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
    Kaseorg.

    19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

    20) Disable hardware GRO when XDP is attached to qede, frm Manish
    Chopra.

    21) Since we encode state in the low pointer bits, dst metrics must be
    at least 4 byte aligned, which is not necessarily true on m68k. Add
    annotations to fix this, from Geert Uytterhoeven.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
    sfc: Include XDP packet headroom in buffer step size.
    sfc: fix channel allocation with brute force
    net: dst: Force 4-byte alignment of dst_metrics
    selftests: pmtu: fix init mtu value in description
    hv_netvsc: Fix unwanted rx_table reset
    net: phy: ensure that phy IDs are correctly typed
    mod_devicetable: fix PHY module format
    qede: Disable hardware gro when xdp prog is installed
    net: ena: fix issues in setting interrupt moderation params in ethtool
    net: ena: fix default tx interrupt moderation interval
    net/smc: unregister ib devices in reboot_event
    net: stmmac: platform: Fix MDIO init for platforms without PHY
    llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
    net: hisilicon: Fix a BUG trigered by wrong bytes_compl
    net: dsa: ksz: use common define for tag len
    s390/qeth: don't return -ENOTSUPP to userspace
    s390/qeth: fix promiscuous mode after reset
    s390/qeth: handle error due to unsupported transport mode
    cxgb4: fix refcount init for TC-MQPRIO offload
    tc-testing: initial tdc selftests for cls_u32
    ...

    Linus Torvalds
     

21 Dec, 2019

1 commit

  • In existing code, the receive indirection table, rx_table, is in
    struct rndis_device, which will be reset when changing MTU, ringparam,
    etc. User configured receive indirection table values will be lost.

    To fix this, move rx_table to struct net_device_context, and check
    netif_is_rxfh_configured(), so rx_table will be set to default only
    if no user configured value.

    Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

15 Dec, 2019

1 commit

  • Host can provide send indirection table messages anytime after RSS is
    enabled by calling rndis_filter_set_rss_param(). So the host provided
    table values may be overwritten by the initialization in
    rndis_set_subchannel().

    To prevent this problem, move the tx_table initialization before calling
    rndis_filter_set_rss_param().

    Fixes: a6fb6aa3cfa9 ("hv_netvsc: Set tx_table to equal weight after subchannels open")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: Jakub Kicinski

    Haiyang Zhang
     

10 Dec, 2019

1 commit

  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

01 Dec, 2019

1 commit

  • Pull Hyper-V updates from Sasha Levin:

    - support for new VMBus protocols (Andrea Parri)

    - hibernation support (Dexuan Cui)

    - latency testing framework (Branden Bonaby)

    - decoupling Hyper-V page size from guest page size (Himadri Pandya)

    * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
    Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
    drivers/hv: Replace binary semaphore with mutex
    drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
    HID: hyperv: Add the support of hibernation
    hv_balloon: Add the support of hibernation
    x86/hyperv: Implement hv_is_hibernation_supported()
    Drivers: hv: balloon: Remove dependencies on guest page size
    Drivers: hv: vmbus: Remove dependencies on guest page size
    x86: hv: Add function to allocate zeroed page for Hyper-V
    Drivers: hv: util: Specify ring buffer size using Hyper-V page size
    Drivers: hv: Specify receive buffer size using Hyper-V page size
    tools: hv: add vmbus testing tool
    drivers: hv: vmbus: Introduce latency testing
    video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
    video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
    hv_netvsc: Add the support of hibernation
    hv_sock: Add the support of hibernation
    video: hyperv_fb: Add the support of hibernation
    scsi: storvsc: Add the support of hibernation
    Drivers: hv: vmbus: Add module parameter to cap the VMBus version
    ...

    Linus Torvalds
     

24 Nov, 2019

1 commit


23 Nov, 2019

1 commit


22 Nov, 2019

3 commits

  • If negotiated NVSP version
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • To reach the data region, the existing code adds offset in struct
    nvsp_5_send_indirect_table on the beginning of this struct. But the
    offset should be based on the beginning of its container,
    struct nvsp_message. This bug causes the first table entry missing,
    and adds an extra zero from the zero pad after the data region.
    This can put extra burden on the channel 0.

    So, correct the offset usage. Also add a boundary check to ensure
    not reading beyond data region.

    Fixes: 5b54dac856cb ("hyperv: Add support for virtual Receive Side Scaling (vRSS)")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • The existing netvsc_detach() and netvsc_attach() APIs make it easy to
    implement the suspend/resume callbacks.

    Signed-off-by: Dexuan Cui
    Reviewed-by: Haiyang Zhang
    Signed-off-by: Sasha Levin

    Dexuan Cui
     

06 Nov, 2019

2 commits


31 Oct, 2019

2 commits

  • If rndis_filter_open() fails, we need to remove the rndis device created
    in earlier steps, before returning an error code. Otherwise, the retry of
    netvsc_attach() from its callers will fail and hang.

    Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • When an error is returned by rndis_filter_set_offload_params(), we should
    still assign the unaffected features to ndev->features. Otherwise, these
    features will be missing.

    Fixes: d6792a5a0747 ("hv_netvsc: Add handler for LRO setting change")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

25 Oct, 2019

1 commit

  • Some interface types could be nested.
    (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
    These interface types should set lockdep class because, without lockdep
    class key, lockdep always warn about unexisting circular locking.

    In the current code, these interfaces have their own lockdep class keys and
    these manage itself. So that there are so many duplicate code around the
    /driver/net and /net/.
    This patch adds new generic lockdep keys and some helper functions for it.

    This patch does below changes.
    a) Add lockdep class keys in struct net_device
    - qdisc_running, xmit, addr_list, qdisc_busylock
    - these keys are used as dynamic lockdep key.
    b) When net_device is being allocated, lockdep keys are registered.
    - alloc_netdev_mqs()
    c) When net_device is being free'd llockdep keys are unregistered.
    - free_netdev()
    d) Add generic lockdep key helper function
    - netdev_register_lockdep_key()
    - netdev_unregister_lockdep_key()
    - netdev_update_lockdep_key()
    e) Remove unnecessary generic lockdep macro and functions
    f) Remove unnecessary lockdep code of each interfaces.

    After this patch, each interface modules don't need to maintain
    their lockdep keys.

    Signed-off-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Taehee Yoo
     

07 Sep, 2019

2 commits

  • VF NIC may go down then come up during host servicing events. This
    causes the VF NIC offloading feature settings to roll back to the
    defaults. This patch can synchronize features from synthetic NIC to
    the VF NIC during ndo_set_features (ethtool -K),
    and netvsc_register_vf when VF comes back after host events.

    Signed-off-by: Haiyang Zhang
    Cc: Mark Bloch
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • In a previous patch, the NETIF_F_SG was missing after the code changes.
    That caused the SG feature to be "fixed". This patch includes it into
    hw_features, so it is tunable again.

    Fixes: 23312a3be999 ("netvsc: negotiate checksum and segmentation parameters")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

20 Aug, 2019

1 commit


10 Aug, 2019

1 commit


31 Jul, 2019

1 commit


22 Jul, 2019

1 commit


15 Jun, 2019

1 commit

  • For better consistency of synthetic NIC names, we set the probe mode to
    PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
    channel offer sequence.

    Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

31 May, 2019

3 commits

  • Pull yet more SPDX updates from Greg KH:
    "Here is another set of reviewed patches that adds SPDX tags to
    different kernel files, based on a set of rules that are being used to
    parse the comments to try to determine that the license of the file is
    "GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of
    these matches are included here, a number of "non-obvious" variants of
    text have been found but those have been postponed for later review
    and analysis.

    There is also a patch in here to add the proper SPDX header to a bunch
    of Kbuild files that we have missed in the past due to new files being
    added and forgetting that Kbuild uses two different file names for
    Makefiles. This issue was reported by the Kbuild maintainer.

    These patches have been out for review on the linux-spdx@vger mailing
    list, and while they were created by automatic tools, they were
    hand-verified by a bunch of different people, all whom names are on
    the patches are reviewers"

    * tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (82 commits)
    treewide: Add SPDX license identifier - Kbuild
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 222
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 221
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 220
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 218
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 217
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 216
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 215
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 214
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 213
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 211
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 210
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 207
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 203
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201
    ...

    Linus Torvalds
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms and conditions of the gnu general public license
    version 2 as published by the free software foundation this program
    is distributed in the hope it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details you should have received a copy of the gnu general
    public license along with this program if not see http www gnu org
    licenses

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 228 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Steve Winslow
    Reviewed-by: Richard Fontana
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190528171438.107155473@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • The netvsc VF skb handler should make sure that skb is not
    shared. Similar logic already exists in bonding and team device
    drivers.

    This is not an issue in practice because the VF devicex
    does not send up shared skb's. But the netvsc driver
    should do the right thing if it did.

    Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

21 May, 2019

1 commit


08 May, 2019

1 commit


04 May, 2019

1 commit

  • When the ring buffer is almost full due to RX completion messages, a
    TX packet may reach the "low watermark" and cause the queue stopped.
    If the TX completion arrives earlier than queue stopping, the wakeup
    may be missed.

    This patch moves the check for the last pending packet to cover both
    EAGAIN and success cases, so the queue will be reliably waked up when
    necessary.

    Reported-and-tested-by: Stephan Klein
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

06 Apr, 2019

1 commit


02 Apr, 2019

1 commit

  • There are two reasons for this.

    First, the xmit_more flag conceptually doesn't fit into the skb, as
    xmit_more is not a property related to the skb.
    Its only a hint to the driver that the stack is about to transmit another
    packet immediately.

    Second, it was only done this way to not have to pass another argument
    to ndo_start_xmit().

    We can place xmit_more in the softnet data, next to the device recursion.
    The recursion counter is already written to on each transmit. The "more"
    indicator is placed right next to it.

    Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more
    to check the "more packets coming" hint.

    skb->xmit_more is retained (but always 0) to not cause build breakage.

    This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/
    conversions. Remaining drivers are converted in the next patches.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

30 Mar, 2019

1 commit

  • After queue stopped, the wakeup mechanism may wake it up again
    when ring buffer usage is lower than a threshold. This may cause
    send path panic on NULL pointer when we stopped all tx queues in
    netvsc_detach and start removing the netvsc device.

    This patch fix it by adding a tx_disable flag to prevent unwanted
    queue wakeup.

    Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
    Reported-by: Mohammed Gamal
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

21 Mar, 2019

1 commit

  • After the previous patch, all the callers of ndo_select_queue()
    provide as a 'fallback' argument netdev_pick_tx.
    The only exceptions are nested calls to ndo_select_queue(),
    which pass down the 'fallback' available in the current scope
    - still netdev_pick_tx.

    We can drop such argument and replace fallback() invocation with
    netdev_pick_tx(). This avoids an indirect call per xmit packet
    in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
    with device drivers implementing such ndo. It also clean the code
    a bit.

    Tested with ixgbe and CONFIG_FCOE=m

    With pktgen using queue xmit:
    threads vanilla patched
    (kpps) (kpps)
    1 2334 2428
    2 4166 4278
    4 7895 8100

    v1 -> v2:
    - rebased after helper's name change

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

27 Feb, 2019

1 commit

  • Incoming packets may have IP header checksum verified by the host.
    They may not have IP header checksum computed after coalescing.
    This patch re-compute the checksum when necessary, otherwise the
    packets may be dropped, because Linux network stack always checks it.

    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

24 Jan, 2019

4 commits

  • Fix all typos from hyperv netvsc code comments.

    Signed-off-by: Adrian Vladu

    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Cc: Sasha Levin
    Cc: "David S. Miller"
    Cc: "Alessandro Pilotti"
    Signed-off-by: Sasha Levin

    Adrian Vladu
     
  • Changing mtu, channels, or buffer sizes ops call to netvsc_attach(),
    rndis_set_subchannel(), which always reset the hash key to default
    value. That will override hash key changed previously. This patch
    fixes the problem by save the hash key, then restore it when we re-
    add the netvsc device.

    Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
    Signed-off-by: Haiyang Zhang
    Reviewed-by: Michael Kelley
    [sl: fix up subject line]
    Signed-off-by: Sasha Levin

    Haiyang Zhang
     
  • These assignments occur in multiple places. The patch refactor them
    to a function for simplicity. It also puts the struct to heap area
    for future expension.

    Signed-off-by: Haiyang Zhang
    Reviewed-by: Michael Kelley
    [sl: fix up subject line]
    Signed-off-by: Sasha Levin

    Haiyang Zhang
     
  • Hyper-V hosts require us to disable RSS before changing RSS key,
    otherwise the changing request will fail. This patch fixes the
    coding error.

    Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
    Reported-by: Wei Hu
    Signed-off-by: Haiyang Zhang
    Reviewed-by: Michael Kelley
    [sl: fix up subject line]
    Signed-off-by: Sasha Levin

    Haiyang Zhang