15 Oct, 2020

1 commit

  • Pull Hyper-V updates from Wei Liu:

    - a series from Boqun Feng to support page size larger than 4K

    - a few miscellaneous clean-ups

    * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
    hv: clocksource: Add notrace attribute to read_hv_sched_clock_*() functions
    x86/hyperv: Remove aliases with X64 in their name
    PCI: hv: Document missing hv_pci_protocol_negotiation() parameter
    scsi: storvsc: Support PAGE_SIZE larger than 4K
    Driver: hv: util: Use VMBUS_RING_SIZE() for ringbuffer sizes
    HID: hyperv: Use VMBUS_RING_SIZE() for ringbuffer sizes
    Input: hyperv-keyboard: Use VMBUS_RING_SIZE() for ringbuffer sizes
    hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication
    hv: hyperv.h: Introduce some hvpfn helper functions
    Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header
    Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs()
    Drivers: hv: vmbus: Introduce types of GPADL
    Drivers: hv: vmbus: Move __vmbus_open()
    Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl
    drivers: hv: remove cast from hyperv_die_event

    Linus Torvalds
     

28 Sep, 2020

1 commit

  • When communicating with Hyper-V, HV_HYP_PAGE_SIZE should be used since
    that's the page size used by Hyper-V and Hyper-V expects all
    page-related data using the unit of HY_HYP_PAGE_SIZE, for example, the
    "pfn" in hv_page_buffer is actually the HV_HYP_PAGE (i.e. the Hyper-V
    page) number.

    In order to support guest whose page size is not 4k, we need to make
    hv_netvsc always use HV_HYP_PAGE_SIZE for Hyper-V communication.

    Signed-off-by: Boqun Feng
    Reviewed-by: Michael Kelley
    Link: https://lore.kernel.org/r/20200916034817.30282-8-boqun.feng@gmail.com
    Signed-off-by: Wei Liu

    Boqun Feng
     

18 Sep, 2020

1 commit

  • For additional robustness in the face of Hyper-V errors or malicious
    behavior, validate all values that originate from packets that Hyper-V
    has sent to the guest in the host-to-guest ring buffer. Ensure that
    invalid values cannot cause indexing off the end of an array, or
    subvert an existing validation via integer overflow. Ensure that
    outgoing packets do not have any leftover guest memory that has not
    been zeroed out.

    Signed-off-by: Andres Beltran
    Co-developed-by: Andrea Parri (Microsoft)
    Signed-off-by: Andrea Parri (Microsoft)
    Cc: "David S. Miller"
    Cc: Jakub Kicinski
    Cc: netdev@vger.kernel.org
    Reviewed-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Andres Beltran
     

11 Sep, 2020

2 commits

  • The previous change "hv_netvsc: Switch the data path at the right time
    during hibernation" adds the call of netvsc_vf_changed() upon
    NETDEV_CHANGE, so it's necessary to avoid the duplicate call and message
    when the VF is brought UP or DOWN.

    Signed-off-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Dexuan Cui
     
  • When netvsc_resume() is called, the mlx5 VF NIC has not been resumed yet,
    so in the future the host might sliently fail the call netvsc_vf_changed()
    -> netvsc_switch_datapath() there, even if the call works now.

    Call netvsc_vf_changed() in the NETDEV_CHANGE event handler: at that time
    the mlx5 VF NIC has been resumed.

    Fixes: 19162fd4063a ("hv_netvsc: Fix hibernation for mlx5 VF driver")
    Signed-off-by: Dexuan Cui
    Signed-off-by: David S. Miller

    Dexuan Cui
     

08 Sep, 2020

1 commit

  • mlx5_suspend()/resume() keep the network interface, so during hibernation
    netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
    netvsc_resume() should call netvsc_vf_changed() to switch the data path
    back to the VF after hibernation. Note: after we close and re-open the
    vmbus channel of the netvsc NIC in netvsc_suspend() and netvsc_resume(),
    the data path is implicitly switched to the netvsc NIC. Similarly,
    netvsc_suspend() should not call netvsc_unregister_vf(), otherwise the VF
    can no longer be used after hibernation.

    For mlx4, since the VF network interafce is explicitly destroyed and
    re-created during hibernation (see mlx4_suspend()/resume()), hv_netvsc
    already explicitly switches the data path from and to the VF automatically
    via netvsc_register_vf() and netvsc_unregister_vf(), so mlx4 doesn't need
    this fix. Note: mlx4 can still work with the fix because in
    netvsc_suspend()/resume() ndev_ctx->vf_netdev is NULL for mlx4.

    Fixes: 0efeea5fb153 ("hv_netvsc: Add the support of hibernation")
    Signed-off-by: Dexuan Cui
    Signed-off-by: Jakub Kicinski

    Dexuan Cui
     

21 Aug, 2020

2 commits

  • netvsc_vf_xmit() / dev_queue_xmit() will call VF NIC’s ndo_select_queue
    or netdev_pick_tx() again. They will use skb_get_rx_queue() to get the
    queue number, so the “skb->queue_mapping - 1” will be used. This may
    cause the last queue of VF not been used.

    Use skb_record_rx_queue() here, so that the skb_get_rx_queue() called
    later will get the correct queue number, and VF will be able to use
    all queues.

    Fixes: b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • When using vf_ops->ndo_select_queue, the number of queues of VF is
    usually bigger than the synthetic NIC. This condition may happen
    often.
    Remove "unlikely" from the comparison of ndev->real_num_tx_queues.

    Fixes: b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

05 Aug, 2020

1 commit

  • If the accelerated networking SRIOV VF device has lost carrier
    use the synthetic network device which is available as backup
    path. This is a rare case since if VF link goes down, normally
    the VMBus device will also loose external connectivity as well.
    But if the communication is between two VM's on the same host
    the VMBus device will still work.

    Reported-by: "Shah, Ashish N"
    Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
    Signed-off-by: Stephen Hemminger
    Reviewed-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

26 Jul, 2020

1 commit

  • Now that BPF program/link management is centralized in generic net_device
    code, kernel code never queries program id from drivers, so
    XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.

    This patch removes all the implementations of those commands in kernel, along
    the xdp_attachment_query().

    This patch was compile-tested on allyesconfig.

    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com

    Andrii Nakryiko
     

25 Jul, 2020

1 commit

  • An imbalanced TX indirection table causes netvsc to have low
    performance. This table is created and managed during runtime. To help
    better diagnose performance issues caused by imbalanced tables, it needs
    make TX indirection tables visible.

    Because TX indirection table is driver specified information, so
    display it via ethtool register dump.

    Signed-off-by: Chi Song
    Reviewed-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Chi Song
     

23 Jul, 2020

1 commit

  • Vlan tagged packets are getting dropped when used with DPDK that uses
    the AF_PACKET interface on a hyperV guest.

    The packet layer uses the tpacket interface to communicate the vlans
    information to the upper layers. On Rx path, these drivers can read the
    vlan info from the tpacket header but on the Tx path, this information
    is still within the packet frame and requires the paravirtual drivers to
    push this back into the NDIS header which is then used by the host OS to
    form the packet.

    This transition from the packet frame to NDIS header is currently missing
    hence causing the host OS to drop the all vlan tagged packets sent by
    the drivers that use AF_PACKET (ETH_P_ALL) such as DPDK.

    Here is an overview of the changes in the vlan header in the packet path:

    The RX path (userspace handles everything):
    1. RX VLAN packet is stripped by HOST OS and placed in NDIS header
    2. Guest Kernel RX hv_netvsc packets and moves VLAN info from NDIS
    header into kernel SKB
    3. Kernel shares packets with user space application with PACKET_MMAP.
    The SKB VLAN info is copied to tpacket layer and indication set
    TP_STATUS_VLAN_VALID.
    4. The user space application will re-insert the VLAN info into the frame

    The TX path:
    1. The user space application has the VLAN info in the frame.
    2. Guest kernel gets packets from the application with PACKET_MMAP.
    3. The kernel later sends the frame to the hv_netvsc driver. The only way
    to send VLANs is when the SKB is setup & the VLAN is stripped from the
    frame.
    4. TX VLAN is re-inserted by HOST OS based on the NDIS header. If it sees
    a VLAN in the frame the packet is dropped.

    Cc: xe-linux-external@cisco.com
    Cc: Sriram Krishnan
    Signed-off-by: Sriram Krishnan
    Reviewed-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Sriram Krishnan
     

04 Jun, 2020

2 commits

  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     
  • Pull hyper-v updates from Wei Liu:

    - a series from Andrea to support channel reassignment

    - a series from Vitaly to clean up Vmbus message handling

    - a series from Michael to clean up and augment hyperv-tlfs.h

    - patches from Andy to clean up GUID usage in Hyper-V code

    - a few other misc patches

    * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (29 commits)
    Drivers: hv: vmbus: Resolve more races involving init_vp_index()
    Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug
    vmbus: Replace zero-length array with flexible-array
    Driver: hv: vmbus: drop a no long applicable comment
    hyper-v: Switch to use UUID types directly
    hyper-v: Replace open-coded variant of %*phN specifier
    hyper-v: Supply GUID pointer to printf() like functions
    hyper-v: Use UUID API for exporting the GUID (part 2)
    asm-generic/hyperv: Add definitions for Get/SetVpRegister hypercalls
    x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent files
    x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #defines
    KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page
    drivers: hv: remove redundant assignment to pointer primary_channel
    scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
    Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type
    Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
    Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic
    PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
    Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal
    hv_utils: Always execute the fcopy and vss callbacks in a tasklet
    ...

    Linus Torvalds
     

22 May, 2020

1 commit

  • There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding
    code, including the "handle" member of struct xdp_buff.

    rfc->v1: Fixed spelling in commit message. (Björn)

    Signed-off-by: Björn Töpel
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com

    Björn Töpel
     

15 May, 2020

1 commit

  • The hyperv NIC driver does memory allocation and copy even without XDP.
    In XDP mode it will allocate a new page for each packet and copy over
    the payload, before invoking the XDP BPF-prog.

    The positive thing it that its easy to determine the xdp.frame_sz.

    The XDP implementation for hv_netvsc transparently passes xdp_prog
    to the associated VF NIC. Many of the Azure VMs are using SRIOV, so
    majority of the data are actually processed directly on the VF driver's XDP
    path. So the overhead of the synthetic data path (hv_netvsc) is minimal.

    Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the
    SKB via build_skb (based on the newly allocated page). Now using XDP
    frame_sz this will provide more skb_tailroom, which netstack can use for
    SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce).

    V3: Adjust patch desc to be more positive.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: Alexei Starovoitov
    Cc: Wei Liu
    Cc: "K. Y. Srinivasan"
    Cc: Haiyang Zhang
    Cc: Stephen Hemminger
    Link: https://lore.kernel.org/bpf/158945339857.97035.10212138582505736163.stgit@firesoul

    Jesper Dangaard Brouer
     

07 May, 2020

1 commit


05 May, 2020

1 commit

  • This patch reverts the folowing commits:

    commit 064ff66e2bef84f1153087612032b5b9eab005bd
    "bonding: add missing netdev_update_lockdep_key()"

    commit 53d374979ef147ab51f5d632dfe20b14aebeccd0
    "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"

    commit 1f26c0d3d24125992ab0026b0dab16c08df947c7
    "net: fix kernel-doc warning in "

    commit ab92d68fc22f9afab480153bd82a20f6e2533769
    "net: core: add generic lockdep keys"

    but keeps the addr_list_lock_key because we still lock
    addr_list_lock nestedly on stack devices, unlikely xmit_lock
    this is safe because we don't take addr_list_lock on any fast
    path.

    Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
    Cc: Dmitry Vyukov
    Cc: Taehee Yoo
    Signed-off-by: Cong Wang
    Acked-by: Taehee Yoo
    Signed-off-by: David S. Miller

    Cong Wang
     

02 May, 2020

1 commit

  • netvsc_start_xmit is used as a callback function for the ndo_start_xmit
    function pointer. ndo_start_xmit's return type is netdev_tx_t but
    netvsc_start_xmit's return type is int.

    This causes a failure with Control Flow Integrity (CFI), which requires
    function pointer prototypes and callback function definitions to match
    exactly. When CFI is in enforcing, the kernel panics. When booting a
    CFI kernel with WSL 2, the VM is immediately terminated because of this.

    The splat when CONFIG_CFI_PERMISSIVE is used:

    [ 5.916765] CFI failure (target: netvsc_start_xmit+0x0/0x10):
    [ 5.916771] WARNING: CPU: 8 PID: 0 at kernel/cfi.c:29 __cfi_check_fail+0x2e/0x40
    [ 5.916772] Modules linked in:
    [ 5.916774] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 5.7.0-rc3-next-20200424-microsoft-cbl-00001-ged4eb37d2c69-dirty #1
    [ 5.916776] RIP: 0010:__cfi_check_fail+0x2e/0x40
    [ 5.916777] Code: 48 c7 c7 70 98 63 a9 48 c7 c6 11 db 47 a9 e8 69 55 59 00 85 c0 75 02 5b c3 48 c7 c7 73 c6 43 a9 48 89 de 31 c0 e8 12 2d f0 ff 0b 5b c3 00 00 cc cc 00 00 cc cc 00 00 cc cc 00 00 85 f6 74 25
    [ 5.916778] RSP: 0018:ffffa803c0260b78 EFLAGS: 00010246
    [ 5.916779] RAX: 712a1af25779e900 RBX: ffffffffa8cf7950 RCX: ffffffffa962cf08
    [ 5.916779] RDX: ffffffffa9c36b60 RSI: 0000000000000082 RDI: ffffffffa9c36b5c
    [ 5.916780] RBP: ffff8ffc4779c2c0 R08: 0000000000000001 R09: ffffffffa9c3c300
    [ 5.916781] R10: 0000000000000151 R11: ffffffffa9c36b60 R12: ffff8ffe39084000
    [ 5.916782] R13: ffffffffa8cf7950 R14: ffffffffa8d12cb0 R15: ffff8ffe39320140
    [ 5.916784] FS: 0000000000000000(0000) GS:ffff8ffe3bc00000(0000) knlGS:0000000000000000
    [ 5.916785] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 5.916786] CR2: 00007ffef5749408 CR3: 00000002f4f5e000 CR4: 0000000000340ea0
    [ 5.916787] Call Trace:
    [ 5.916788]
    [ 5.916790] __cfi_check+0x3ab58/0x450e0
    [ 5.916793] ? dev_hard_start_xmit+0x11f/0x160
    [ 5.916795] ? sch_direct_xmit+0xf2/0x230
    [ 5.916796] ? __dev_queue_xmit.llvm.11471227737707190958+0x69d/0x8e0
    [ 5.916797] ? neigh_resolve_output+0xdf/0x220
    [ 5.916799] ? neigh_connected_output.cfi_jt+0x8/0x8
    [ 5.916801] ? ip6_finish_output2+0x398/0x4c0
    [ 5.916803] ? nf_nat_ipv6_out+0x10/0xa0
    [ 5.916804] ? nf_hook_slow+0x84/0x100
    [ 5.916807] ? ip6_input_finish+0x8/0x8
    [ 5.916807] ? ip6_output+0x6f/0x110
    [ 5.916808] ? __ip6_local_out.cfi_jt+0x8/0x8
    [ 5.916810] ? mld_sendpack+0x28e/0x330
    [ 5.916811] ? ip_rt_bug+0x8/0x8
    [ 5.916813] ? mld_ifc_timer_expire+0x2db/0x400
    [ 5.916814] ? neigh_proxy_process+0x8/0x8
    [ 5.916816] ? call_timer_fn+0x3d/0xd0
    [ 5.916817] ? __run_timers+0x2a9/0x300
    [ 5.916819] ? rcu_core_si+0x8/0x8
    [ 5.916820] ? run_timer_softirq+0x14/0x30
    [ 5.916821] ? __do_softirq+0x154/0x262
    [ 5.916822] ? native_x2apic_icr_write+0x8/0x8
    [ 5.916824] ? irq_exit+0xba/0xc0
    [ 5.916825] ? hv_stimer0_vector_handler+0x99/0xe0
    [ 5.916826] ? hv_stimer0_callback_vector+0xf/0x20
    [ 5.916826]
    [ 5.916828] ? hv_stimer_global_cleanup.cfi_jt+0x8/0x8
    [ 5.916829] ? raw_setsockopt+0x8/0x8
    [ 5.916830] ? default_idle+0xe/0x10
    [ 5.916832] ? do_idle.llvm.10446269078108580492+0xb7/0x130
    [ 5.916833] ? raw_setsockopt+0x8/0x8
    [ 5.916833] ? cpu_startup_entry+0x15/0x20
    [ 5.916835] ? cpu_hotplug_enable.cfi_jt+0x8/0x8
    [ 5.916836] ? start_secondary+0x188/0x190
    [ 5.916837] ? secondary_startup_64+0xa5/0xb0
    [ 5.916838] ---[ end trace f2683fa869597ba5 ]---

    Avoid this by using the right return type for netvsc_start_xmit.

    Fixes: fceaf24a943d8 ("Staging: hv: add the Hyper-V virtual network driver")
    Link: https://github.com/ClangBuiltLinux/linux/issues/1009
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Nathan Chancellor
     

23 Apr, 2020

1 commit

  • vmbus_chan_sched() might call the netvsc driver callback function that
    ends up scheduling NAPI work. This "work" can access the channel ring
    buffer, so we must ensure that any such work is completed and that the
    ring buffer is no longer being accessed before freeing the ring buffer
    data structure in the channel closure path. To this end, disable NAPI
    before calling vmbus_close() in netvsc_device_remove().

    Suggested-by: Michael Kelley
    Signed-off-by: Andrea Parri (Microsoft)
    Acked-by: Stephen Hemminger
    Cc: "David S. Miller"
    Cc:
    Link: https://lore.kernel.org/r/20200406001514.19876-5-parri.andrea@gmail.com
    Reviewed-by: Michael Kelley
    Signed-off-by: Wei Liu

    Andrea Parri (Microsoft)
     

31 Mar, 2020

1 commit

  • The vzalloc_node(), already rounds the total size to whole pages, and
    sizeof(u64) is smaller than sizeof(struct recv_comp_data). So
    round_up of recv_completion_cnt is not necessary, and may cause extra
    memory allocation.

    To save memory, remove this unnecessary round_up for recv_completion_cnt.

    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

01 Mar, 2020

1 commit

  • With the ethtool_virtdev_set_link_ksettings function in core/ethtool.c,
    ibmveth, netvsc, and virtio now use the core's helper function.

    Funtionality changes that pertain to ibmveth driver include:

    1. Changed the initial hardcoded link speed to 1GB.

    2. Added support for allowing a user to change the reported link
    speed via ethtool.

    Functionality changes to the netvsc driver include:

    1. When netvsc_get_link_ksettings is called, it will defer to the VF
    device if it exists to pull accelerated networking values, otherwise
    pull default or user-defined values.

    2. Similarly, if netvsc_set_link_ksettings called and a VF device
    exists, the real values of speed and duplex are changed.

    Signed-off-by: Cris Forno
    Signed-off-by: David S. Miller

    Cris Forno
     

28 Feb, 2020

1 commit


24 Feb, 2020

1 commit

  • When netvsc_attach() is called by operations like changing MTU, etc.,
    an extra wakeup may happen while netvsc_attach() calling
    rndis_filter_device_add() which sends rndis messages when queue is
    stopped in netvsc_detach(). The completion message will wake up queue 0.

    We can reproduce the issue by changing MTU etc., then the wake_queue
    counter from "ethtool -S" will increase beyond stop_queue counter:
    stop_queue: 0
    wake_queue: 1
    The issue causes queue wake up, and counter increment, no other ill
    effects in current code. So we didn't see any network problem for now.

    To fix this, initialize tx_disable to true, and set it to false when
    the NIC is ready to be attached or registered.

    Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

20 Feb, 2020

1 commit


07 Feb, 2020

1 commit

  • The caller of XDP_SETUP_PROG has already incremented refcnt in
    __bpf_prog_get(), so drivers should only increment refcnt by
    num_queues - 1.

    To fix the issue, update netvsc_xdp_set() to add the correct number
    to refcnt.

    Hold a refcnt in netvsc_xdp_set()’s other caller, netvsc_attach().

    And, do the same in netvsc_vf_setxdp(). Otherwise, every time when VF is
    removed and added from the host side, the refcnt will be decreased by one,
    which may cause page fault when unloading xdp program.

    Fixes: 351e1581395f ("hv_netvsc: Add XDP support")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

25 Jan, 2020

1 commit

  • This patch adds support of XDP in native mode for hv_netvsc driver, and
    transparently sets the XDP program on the associated VF NIC as well.

    Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to
    VF NIC automatically. Setting / unsetting XDP program on VF NIC directly
    is not recommended, also not propagated to synthetic NIC, and may be
    overwritten by setting of synthetic NIC.

    The Azure/Hyper-V synthetic NIC receive buffer doesn't provide headroom
    for XDP. We thought about re-use the RNDIS header space, but it's too
    small. So we decided to copy the packets to a page buffer for XDP. And,
    most of our VMs on Azure have Accelerated Network (SRIOV) enabled, so
    most of the packets run on VF NIC. The synthetic NIC is considered as a
    fallback data-path. So the data copy on netvsc won't impact performance
    significantly.

    XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO
    before running XDP:
    ethtool -K eth0 lro off

    XDP actions not yet supported:
    XDP_REDIRECT

    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

16 Jan, 2020

1 commit

  • kmemleak detects the following memory leak when hot removing
    a network device:

    unreferenced object 0xffff888083f63600 (size 256):
    comm "kworker/0:1", pid 12, jiffies 4294831717 (age 1113.676s)
    hex dump (first 32 bytes):
    00 40 c7 33 80 88 ff ff 00 00 00 00 10 00 00 00 .@.3............
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
    backtrace:
    [] rndis_filter_device_add+0x117/0x11c0 [hv_netvsc]
    [] netvsc_probe+0x5e7/0xbf0 [hv_netvsc]
    [] vmbus_probe+0x74/0x170 [hv_vmbus]
    [] really_probe+0x22f/0xb50
    [] driver_probe_device+0x25e/0x370
    [] bus_for_each_drv+0x11f/0x1b0
    [] __device_attach+0x1c6/0x2f0
    [] bus_probe_device+0x1a6/0x260
    [] device_add+0x10a3/0x18e0
    [] vmbus_device_register+0xe7/0x1e0 [hv_vmbus]
    [] vmbus_add_channel_work+0x8ab/0x1770 [hv_vmbus]
    [] process_one_work+0x919/0x17d0
    [] worker_thread+0x87/0xb40
    [] kthread+0x333/0x3f0
    [] ret_from_fork+0x3a/0x50

    rndis_filter_device_add() allocates an instance of struct rndis_device
    which never gets deallocated as rndis_filter_device_remove() sets
    net_device->extension which points to the rndis_device struct to NULL,
    leaving the rndis_device dangling.

    Since net_device->extension is eventually freed in free_netvsc_device(),
    we refrain from setting it to NULL inside rndis_filter_device_remove()

    Signed-off-by: Mohammed Gamal
    Reviewed-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Mohammed Gamal
     

23 Dec, 2019

1 commit

  • Pull networking fixes from David Miller:

    1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
    including adding a missing ipv6 match description.

    2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
    Bhat.

    3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

    4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

    5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

    6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
    Chaignon.

    7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

    8) Fix established socket lookup race when socket goes from
    TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
    RCU grace period. From Eric Dumazet.

    9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

    10) Fix active backup transition after link failure in bonding, from
    Mahesh Bandewar.

    11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

    12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

    13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

    14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

    15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

    16) Reject invalid MTU values in stmmac, from Jose Abreu.

    17) Fix refcount leak in error path of u32 classifier, from Davide
    Caratti.

    18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
    Kaseorg.

    19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

    20) Disable hardware GRO when XDP is attached to qede, frm Manish
    Chopra.

    21) Since we encode state in the low pointer bits, dst metrics must be
    at least 4 byte aligned, which is not necessarily true on m68k. Add
    annotations to fix this, from Geert Uytterhoeven.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
    sfc: Include XDP packet headroom in buffer step size.
    sfc: fix channel allocation with brute force
    net: dst: Force 4-byte alignment of dst_metrics
    selftests: pmtu: fix init mtu value in description
    hv_netvsc: Fix unwanted rx_table reset
    net: phy: ensure that phy IDs are correctly typed
    mod_devicetable: fix PHY module format
    qede: Disable hardware gro when xdp prog is installed
    net: ena: fix issues in setting interrupt moderation params in ethtool
    net: ena: fix default tx interrupt moderation interval
    net/smc: unregister ib devices in reboot_event
    net: stmmac: platform: Fix MDIO init for platforms without PHY
    llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
    net: hisilicon: Fix a BUG trigered by wrong bytes_compl
    net: dsa: ksz: use common define for tag len
    s390/qeth: don't return -ENOTSUPP to userspace
    s390/qeth: fix promiscuous mode after reset
    s390/qeth: handle error due to unsupported transport mode
    cxgb4: fix refcount init for TC-MQPRIO offload
    tc-testing: initial tdc selftests for cls_u32
    ...

    Linus Torvalds
     

21 Dec, 2019

1 commit

  • In existing code, the receive indirection table, rx_table, is in
    struct rndis_device, which will be reset when changing MTU, ringparam,
    etc. User configured receive indirection table values will be lost.

    To fix this, move rx_table to struct net_device_context, and check
    netif_is_rxfh_configured(), so rx_table will be set to default only
    if no user configured value.

    Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     

15 Dec, 2019

1 commit

  • Host can provide send indirection table messages anytime after RSS is
    enabled by calling rndis_filter_set_rss_param(). So the host provided
    table values may be overwritten by the initialization in
    rndis_set_subchannel().

    To prevent this problem, move the tx_table initialization before calling
    rndis_filter_set_rss_param().

    Fixes: a6fb6aa3cfa9 ("hv_netvsc: Set tx_table to equal weight after subchannels open")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: Jakub Kicinski

    Haiyang Zhang
     

10 Dec, 2019

1 commit

  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

01 Dec, 2019

1 commit

  • Pull Hyper-V updates from Sasha Levin:

    - support for new VMBus protocols (Andrea Parri)

    - hibernation support (Dexuan Cui)

    - latency testing framework (Branden Bonaby)

    - decoupling Hyper-V page size from guest page size (Himadri Pandya)

    * tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits)
    Drivers: hv: vmbus: Fix crash handler reset of Hyper-V synic
    drivers/hv: Replace binary semaphore with mutex
    drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86
    HID: hyperv: Add the support of hibernation
    hv_balloon: Add the support of hibernation
    x86/hyperv: Implement hv_is_hibernation_supported()
    Drivers: hv: balloon: Remove dependencies on guest page size
    Drivers: hv: vmbus: Remove dependencies on guest page size
    x86: hv: Add function to allocate zeroed page for Hyper-V
    Drivers: hv: util: Specify ring buffer size using Hyper-V page size
    Drivers: hv: Specify receive buffer size using Hyper-V page size
    tools: hv: add vmbus testing tool
    drivers: hv: vmbus: Introduce latency testing
    video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
    video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
    hv_netvsc: Add the support of hibernation
    hv_sock: Add the support of hibernation
    video: hyperv_fb: Add the support of hibernation
    scsi: storvsc: Add the support of hibernation
    Drivers: hv: vmbus: Add module parameter to cap the VMBus version
    ...

    Linus Torvalds
     

24 Nov, 2019

1 commit


23 Nov, 2019

1 commit


22 Nov, 2019

3 commits

  • If negotiated NVSP version
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • To reach the data region, the existing code adds offset in struct
    nvsp_5_send_indirect_table on the beginning of this struct. But the
    offset should be based on the beginning of its container,
    struct nvsp_message. This bug causes the first table entry missing,
    and adds an extra zero from the zero pad after the data region.
    This can put extra burden on the channel 0.

    So, correct the offset usage. Also add a boundary check to ensure
    not reading beyond data region.

    Fixes: 5b54dac856cb ("hyperv: Add support for virtual Receive Side Scaling (vRSS)")
    Signed-off-by: Haiyang Zhang
    Signed-off-by: David S. Miller

    Haiyang Zhang
     
  • The existing netvsc_detach() and netvsc_attach() APIs make it easy to
    implement the suspend/resume callbacks.

    Signed-off-by: Dexuan Cui
    Reviewed-by: Haiyang Zhang
    Signed-off-by: Sasha Levin

    Dexuan Cui
     

06 Nov, 2019

2 commits