22 Jun, 2017

1 commit

  • Add a flag to indicate if a queue is rate-limited. Test the flag in
    NAPI poll handler and avoid rescheduling the queue if true, otherwise
    we risk locking up the host. The rescheduling will be done in the
    timer callback function.

    Reported-by: Jean-Louis Dupond
    Signed-off-by: Wei Liu
    Tested-by: Jean-Louis Dupond
    Reviewed-by: Paul Durrant
    Signed-off-by: David S. Miller

    Wei Liu
     

13 Mar, 2017

1 commit


30 Jan, 2017

1 commit

  • The default for the maximum number of tx/rx queues of one interface is
    the number of cpus of the system today. As each queue pair reserves 512
    grant pages this default consumes a ridiculous number of grants for
    large guests.

    Limit the queue number to 8 as default. This value can be modified
    via a module parameter if required.

    Signed-off-by: Juergen Gross
    Reviewed-by: Boris Ostrovsky
    Signed-off-by: Boris Ostrovsky

    Juergen Gross
     

07 Oct, 2016

1 commit

  • The netback source module has become very large and somewhat confusing.
    This patch simply moves all code related to the backend to frontend (i.e
    guest side rx) data-path into a separate rx source module.

    This patch contains no functional change, it is code movement and
    minimal changes to avoid patch style-check issues.

    Signed-off-by: Paul Durrant
    Signed-off-by: David S. Miller

    Paul Durrant
     

22 Sep, 2016

1 commit


17 May, 2016

4 commits

  • My recent patch to include/xen/interface/io/netif.h defines a new extra
    info type that can be used to pass hash values between backend and guest
    frontend.

    This patch adds code to xen-netback to use the value in a hash extra
    info fragment passed from the guest frontend in a transmit-side
    (i.e. netback receive side) packet to set the skb hash accordingly.

    Signed-off-by: Paul Durrant
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • My recent patch to include/xen/interface/io/netif.h defines a new extra
    info type that can be used to pass hash values between backend and guest
    frontend.

    This patch adds code to xen-netback to pass hash values calculated for
    guest receive-side packets (i.e. netback transmit side) to the frontend.

    Signed-off-by: Paul Durrant
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • My recent patch to include/xen/interface/io/netif.h defines a new shared
    ring (in addition to the rx and tx rings) for passing control messages
    from a VM frontend driver to a backend driver.

    A previous patch added the necessary boilerplate for mapping the control
    ring from the frontend, should it be created. This patch adds
    implementations for each of the defined protocol messages.

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     
  • My recent patch to include/xen/interface/io/netif.h defines a new shared
    ring (in addition to the rx and tx rings) for passing control messages
    from a VM frontend driver to a backend driver.

    This patch adds the necessary code to xen-netback to map this new shared
    ring, should it be created by a frontend, but does not add implementations
    for any of the defined protocol messages. These are added in a subsequent
    patch for clarity.

    Signed-off-by: Paul Durrant
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     

13 May, 2016

1 commit

  • Patch 562abd39 "xen-netback: support multiple extra info fragments
    passed from frontend" contained a mistake which can result in an in-
    correct number of responses being generated when handling errors
    encountered when processing packets containing extra info fragments.
    This patch fixes the problem.

    Signed-off-by: Paul Durrant
    Reported-by: Jan Beulich
    Cc: Wei Liu
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     

14 Mar, 2016

1 commit

  • The code does not currently support a frontend passing multiple extra info
    fragments to the backend in a tx request. The xenvif_get_extras() function
    handles multiple extra_info fragments but make_tx_response() assumes there
    is only ever a single extra info fragment.

    This patch modifies xenvif_get_extras() to pass back a count of extra
    info fragments, which is then passed to make_tx_response() (after
    possibly being stashed in pending_tx_info for deferred responses).

    Signed-off-by: Paul Durrant
    Cc: Wei Liu
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     

16 Jan, 2016

1 commit

  • Using the MTU or GSO size to determine the number of required guest Rx
    requests for an skb was subtly broken since these value may change at
    runtime.

    After 1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b (xen-netback: always
    fully coalesce guest Rx packets) we always fully pack a packet into
    its guest Rx slots. Calculating the number of required slots from the
    packet length is then easy.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     

18 Dec, 2015

2 commits

  • Instead of open-coding memcpy()s and directly accessing Tx and Rx
    requests, use the new RING_COPY_REQUEST() that ensures the local copy
    is correct.

    This is more than is strictly necessary for guest Rx requests since
    only the id and gref fields are used and it is harmless if the
    frontend modifies these.

    This is part of XSA155.

    CC: stable@vger.kernel.org
    Reviewed-by: Wei Liu
    Signed-off-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    David Vrabel
     
  • The last from guest transmitted request gives no indication about the
    minimum amount of credit that the guest might need to send a packet
    since the last packet might have been a small one.

    Instead allow for the worst case 128 KiB packet.

    This is part of XSA155.

    CC: stable@vger.kernel.org
    Reviewed-by: Wei Liu
    Signed-off-by: David Vrabel
    Signed-off-by: Konrad Rzeszutek Wilk

    David Vrabel
     

23 Oct, 2015

2 commits


11 Sep, 2015

2 commits

  • Pull xen terminology fixes from David Vrabel:
    "Use the correct GFN/BFN terms more consistently"

    * tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn
    xen/privcmd: Further s/MFN/GFN/ clean-up
    hvc/xen: Further s/MFN/GFN clean-up
    video/xen-fbfront: Further s/MFN/GFN clean-up
    xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn
    xen: Use correctly the Xen memory terminologies
    arm/xen: implement correctly pfn_to_mfn
    xen: Make clear that swiotlb and biomerge are dealing with DMA address

    Linus Torvalds
     
  • Originally that parameter was always reset to num_online_cpus during
    module initialisation, which renders it useless.

    The fix is to only set max_queues to num_online_cpus when user has not
    provided a value.

    Reported-by: Johnny Strom
    Signed-off-by: Wei Liu
    Reviewed-by: David Vrabel
    Acked-by: Ian Campbell
    Signed-off-by: David S. Miller

    Wei Liu
     

10 Sep, 2015

1 commit

  • Commit f48da8b14d04ca87ffcffe68829afd45f926ec6a (xen-netback: fix
    unlimited guest Rx internal queue and carrier flapping) introduced a
    regression.

    The PV frontend in IPXE only places 4 requests on the guest Rx ring.
    Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
    not receive any packets.

    a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
    for the largest possible packet. Calculate the required slots
    based on the maximum GSO size or the MTU.

    This calculation of the number of required slots relies on
    1650d5455bd2 (xen-netback: always fully coalesce guest Rx packets)
    which present in 4.0-rc1 and later.

    b) Reduce the Rx stall detection to checking for at least one
    available Rx request. This is fine since we're predominately
    concerned with detecting interfaces which are down and thus have
    zero available Rx requests.

    Signed-off-by: David Vrabel
    Reviewed-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

09 Sep, 2015

1 commit

  • Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
    is meant, I suspect this is because the first support for Xen was for
    PV. This resulted in some misimplementation of helpers on ARM and
    confused developers about the expected behavior.

    For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
    Although, if we look at the implementation on x86, it's returning a GFN.

    For clarity and avoid new confusion, replace any reference to mfn with
    gfn in any helpers used by PV drivers. The x86 code will still keep some
    reference of pfn_to_mfn which may be used by all kind of guests
    No changes as been made in the hypercall field, even
    though they may be invalid, in order to keep the same as the defintion
    in xen repo.

    Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
    name to close to the KVM function gfn_to_page.

    Take also the opportunity to simplify simple construction such
    as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
    will come in follow-up patches.

    [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb

    Signed-off-by: Julien Grall
    Reviewed-by: Stefano Stabellini
    Acked-by: Dmitry Torokhov
    Acked-by: Wei Liu
    Signed-off-by: David Vrabel

    Julien Grall
     

03 Sep, 2015

1 commit

  • Xen's PV network protocol includes messages to add/remove ethernet
    multicast addresses to/from a filter list in the backend. This allows
    the frontend to request the backend only forward multicast packets
    which are of interest thus preventing unnecessary noise on the shared
    ring.

    The canonical netif header in git://xenbits.xen.org/xen.git specifies
    the message format (two more XEN_NETIF_EXTRA_TYPEs) so the minimal
    necessary changes have been pulled into include/xen/interface/io/netif.h.

    To prevent the frontend from extending the multicast filter list
    arbitrarily a limit (XEN_NETBK_MCAST_MAX) has been set to 64 entries.
    This limit is not specified by the protocol and so may change in future.
    If the limit is reached then the next XEN_NETIF_EXTRA_TYPE_MCAST_ADD
    sent by the frontend will be failed with NETIF_RSP_ERROR.

    Signed-off-by: Paul Durrant
    Cc: Ian Campbell
    Cc: Wei Liu
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Paul Durrant
     

07 Aug, 2015

1 commit

  • Waking the dealloc thread before decrementing inflight_packets is racy
    because it means the thread may go to sleep before inflight_packets is
    decremented. If kthread_stop() has already been called, the dealloc
    thread may wait forever with nothing to wake it. Instead, wake the
    thread only after decrementing inflight_packets.

    Signed-off-by: Ross Lagerwall
    Signed-off-by: David S. Miller

    Ross Lagerwall
     

04 Aug, 2015

1 commit

  • Determine if a fraglist is needed in the tx path, and allocate it if
    necessary before setting up the copy and map operations.
    Otherwise, undoing the copy and map operations is tricky.

    This fixes a use-after-free: if allocating the fraglist failed, the copy
    and map operations that had been set up were still executed, writing
    over the data area of a freed skb.

    Signed-off-by: Ross Lagerwall
    Signed-off-by: David S. Miller

    Ross Lagerwall
     

15 Jul, 2015

1 commit


02 Jul, 2015

1 commit

  • Pull xen updates from David Vrabel:
    "Xen features and cleanups for 4.2-rc0:

    - add "make xenconfig" to assist in generating configs for Xen guests

    - preparatory cleanups necessary for supporting 64 KiB pages in ARM
    guests

    - automatically use hvc0 as the default console in ARM guests"

    * tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    block/xen-blkback: s/nr_pages/nr_segs/
    block/xen-blkfront: Remove invalid comment
    block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS
    arm/xen: Drop duplicate define mfn_to_virt
    xen/grant-table: Remove unused macro SPP
    xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring
    xen: Include xen/page.h rather than asm/xen/page.h
    kconfig: add xenconfig defconfig helper
    kconfig: clarify kvmconfig is for kvm
    xen/pcifront: Remove usage of struct timeval
    xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON()
    hvc_xen: avoid uninitialized variable warning
    xenbus: avoid uninitialized variable warning
    xen/arm: allow console=hvc0 to be omitted for guests
    arm,arm64/xen: move Xen initialization earlier
    arm/xen: Correctly check if the event channel interrupt is present

    Linus Torvalds
     

22 Jun, 2015

2 commits


17 Jun, 2015

1 commit

  • Using xen/page.h will be necessary later for using common xen page
    helpers.

    As xen/page.h already include asm/xen/page.h, always use the later.

    Signed-off-by: Julien Grall
    Reviewed-by: David Vrabel
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Wei Liu
    Cc: Konrad Rzeszutek Wilk
    Cc: Boris Ostrovsky
    Cc: netdev@vger.kernel.org
    Signed-off-by: David Vrabel

    Julien Grall
     

02 Jun, 2015

2 commits

  • Conflicts:
    drivers/net/phy/amd-xgbe-phy.c
    drivers/net/wireless/iwlwifi/Kconfig
    include/net/mac80211.h

    iwlwifi/Kconfig and mac80211.h were both trivial overlapping
    changes.

    The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
    the bug fix that happened on the 'net' side is already integrated
    into the rest of the amd-xgbe driver.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
    drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=]
    (txreq.offset&~PAGE_MASK) + txreq.size);
    ^

    PAGE_MASK's type can vary by arch, so a cast is needed.

    Signed-off-by: Ian Campbell
    ----
    v2: Cast to unsigned long, since PAGE_MASK can vary by arch.
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Ian Campbell
     

26 May, 2015

1 commit


17 Apr, 2015

1 commit

  • Pull xen features and fixes from David Vrabel:

    - use a single source list of hypercalls, generating other tables etc.
    at build time.

    - add a "Xen PV" APIC driver to support >255 VCPUs in PV guests.

    - significant performance improve to guest save/restore/migration.

    - scsiback/front save/restore support.

    - infrastructure for multi-page xenbus rings.

    - misc fixes.

    * tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/pci: Try harder to get PXM information for Xen
    xenbus_client: Extend interface to support multi-page ring
    xen-pciback: also support disabling of bus-mastering and memory-write-invalidate
    xen: support suspend/resume in pvscsi frontend
    xen: scsiback: add LUN of restored domain
    xen-scsiback: define a pr_fmt macro with xen-pvscsi
    xen/mce: fix up xen_late_init_mcelog() error handling
    xen/privcmd: improve performance of MMAPBATCH_V2
    xen: unify foreign GFN map/unmap for auto-xlated physmap guests
    x86/xen/apic: WARN with details.
    x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs
    xen/pciback: Don't print scary messages when unsupported by hypervisor.
    xen: use generated hypercall symbols in arch/x86/xen/xen-head.S
    xen: use generated hypervisor symbols in arch/x86/xen/trace.c
    xen: synchronize include/xen/interface/xen.h with xen
    xen: build infrastructure for generating hypercall depending symbols
    xen: balloon: Use static attribute groups for sysfs entries
    xen: pcpu: Use static attribute groups for sysfs entry

    Linus Torvalds
     

15 Apr, 2015

1 commit

  • Originally Xen PV drivers only use single-page ring to pass along
    information. This might limit the throughput between frontend and
    backend.

    The patch extends Xenbus driver to support multi-page ring, which in
    general should improve throughput if ring is the bottleneck. Changes to
    various frontend / backend to adapt to the new interface are also
    included.

    Affected Xen drivers:
    * blkfront/back
    * netfront/back
    * pcifront/back
    * scsifront/back
    * vtpmfront

    The interface is documented, as before, in xenbus_client.c.

    Signed-off-by: Wei Liu
    Signed-off-by: Paul Durrant
    Signed-off-by: Bob Liu
    Cc: Konrad Wilk
    Cc: Boris Ostrovsky
    Signed-off-by: David Vrabel

    Wei Liu
     

21 Mar, 2015

2 commits

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    net/core/sysctl_net_core.c
    net/ipv4/inet_diag.c

    The be_main.c conflict resolution was really tricky. The conflict
    hunks generated by GIT were very unhelpful, to say the least. It
    split functions in half and moved them around, when the real actual
    conflict only existed solely inside of one function, that being
    be_map_pci_bars().

    So instead, to resolve this, I checked out be_main.c from the top
    of net-next, then I applied the be_main.c changes from 'net' since
    the last time I merged. And this worked beautifully.

    The inet_diag.c and sysctl_net_core.c conflicts were simple
    overlapping changes, and were easily to resolve.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • With the current netback, the bandwidth limiter's parameters are only
    settable during vif setup time. This patch register a watch on them, and
    thus makes them runtime changeable.

    When the watch fires, the timer is reset. The timer's mutex is used for
    fencing the change.

    Cc: Anthony Liguori
    Signed-off-by: Imre Palik
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Palik, Imre
     

12 Mar, 2015

1 commit

  • This fixes a performance regression introduced by
    7fbb9d8415d4a51cf542e87cf3a717a9f7e6aedc (xen-netback: release pending
    index before pushing Tx responses)

    Moving the notify outside of the spin locks means it can be delayed a
    long time (if the dealloc thread is descheduled or there is an
    interrupt or softirq).

    Signed-off-by: David Vrabel
    Reviewed-by: Zoltan Kiss
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

06 Mar, 2015

2 commits

  • When handling a from-guest frag list, xenvif_handle_frag_list()
    replaces the frags before calling the destructor to clean up the
    original (foreign) frags. Whilst this is safe (the destructor doesn't
    actually use the frags), it looks odd.

    Reorder the function to be less confusing.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     
  • Every time a VIF is destroyed up to 256 pages may be leaked if packets
    with more than MAX_SKB_FRAGS frags were transmitted from the guest.
    Even worse, if another user of ballooned pages allocated one of these
    ballooned pages it would not handle the unexpectedly >1 page count
    (e.g., gntdev would deadlock when unmapping a grant because the page
    count would never reach 1).

    When handling a from-guest skb with a frag list, unref the frags
    before releasing them so they are freed correctly when the VIF is
    destroyed.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     

25 Feb, 2015

1 commit

  • If the pending indexes are released /after/ pushing the Tx response
    then a stale pending index may be used if a new Tx request is
    immediately pushed by the frontend. The may cause various WARNINGs or
    BUGs if the stale pending index is actually still in use.

    Fix this by releasing the pending index before pushing the Tx
    response.

    The full barrier for the pending ring update is not required since the
    the Tx response push already has a suitable write barrier.

    Signed-off-by: David Vrabel
    Reviewed-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

11 Feb, 2015

1 commit

  • Pull networking updates from David Miller:

    1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
    wrong, and this pull actually adds an extra commit on top of the
    branch I'm pulling to fix that up, so that the pre-merge state is
    ok. - Linus ]

    2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation. From Alexander Duyck.

    3) Remove sock_iocb altogether, from CHristoph Hellwig.

    4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

    5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

    6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

    7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
    Florian Westphal.

    8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

    9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

    10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

    11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

    12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms. From Neal Cardwell.

    13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

    14) Support xmit_more in be2net, from Sathya Perla.

    15) Group Policy extensions for vxlan, from Thomas Graf.

    16) Remove Checksum Offload support for vxlan, from Tom Herbert.

    17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
    Vlad Yasevich.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
    crypto: fix af_alg_make_sg() conversion to iov_iter
    ipv4: Namespecify TCP PMTU mechanism
    i40e: Fix for stats init function call in Rx setup
    tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
    openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
    ipv6: Make __ipv6_select_ident static
    ipv6: Fix fragment id assignment on LE arches.
    bridge: Fix inability to add non-vlan fdb entry
    net: Mellanox: Delete unnecessary checks before the function call "vunmap"
    cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
    ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
    net: dsa: Remove redundant phy_attach()
    IB/mlx4: Reset flow support for IB kernel ULPs
    IB/mlx4: Always use the correct port for mirrored multicast attachments
    net/bonding: Fix potential bad memory access during bonding events
    tipc: remove tipc_snprintf
    tipc: nl compat add noop and remove legacy nl framework
    tipc: convert legacy nl stats show to nl compat
    tipc: convert legacy nl net id get to nl compat
    tipc: convert legacy nl net id set to nl compat
    ...

    Linus Torvalds