17 Apr, 2015

1 commit

  • Pull xen features and fixes from David Vrabel:

    - use a single source list of hypercalls, generating other tables etc.
    at build time.

    - add a "Xen PV" APIC driver to support >255 VCPUs in PV guests.

    - significant performance improve to guest save/restore/migration.

    - scsiback/front save/restore support.

    - infrastructure for multi-page xenbus rings.

    - misc fixes.

    * tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/pci: Try harder to get PXM information for Xen
    xenbus_client: Extend interface to support multi-page ring
    xen-pciback: also support disabling of bus-mastering and memory-write-invalidate
    xen: support suspend/resume in pvscsi frontend
    xen: scsiback: add LUN of restored domain
    xen-scsiback: define a pr_fmt macro with xen-pvscsi
    xen/mce: fix up xen_late_init_mcelog() error handling
    xen/privcmd: improve performance of MMAPBATCH_V2
    xen: unify foreign GFN map/unmap for auto-xlated physmap guests
    x86/xen/apic: WARN with details.
    x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs
    xen/pciback: Don't print scary messages when unsupported by hypervisor.
    xen: use generated hypercall symbols in arch/x86/xen/xen-head.S
    xen: use generated hypervisor symbols in arch/x86/xen/trace.c
    xen: synchronize include/xen/interface/xen.h with xen
    xen: build infrastructure for generating hypercall depending symbols
    xen: balloon: Use static attribute groups for sysfs entries
    xen: pcpu: Use static attribute groups for sysfs entry

    Linus Torvalds
     

15 Apr, 2015

1 commit

  • Originally Xen PV drivers only use single-page ring to pass along
    information. This might limit the throughput between frontend and
    backend.

    The patch extends Xenbus driver to support multi-page ring, which in
    general should improve throughput if ring is the bottleneck. Changes to
    various frontend / backend to adapt to the new interface are also
    included.

    Affected Xen drivers:
    * blkfront/back
    * netfront/back
    * pcifront/back
    * scsifront/back
    * vtpmfront

    The interface is documented, as before, in xenbus_client.c.

    Signed-off-by: Wei Liu
    Signed-off-by: Paul Durrant
    Signed-off-by: Bob Liu
    Cc: Konrad Wilk
    Cc: Boris Ostrovsky
    Signed-off-by: David Vrabel

    Wei Liu
     

21 Mar, 2015

2 commits

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    net/core/sysctl_net_core.c
    net/ipv4/inet_diag.c

    The be_main.c conflict resolution was really tricky. The conflict
    hunks generated by GIT were very unhelpful, to say the least. It
    split functions in half and moved them around, when the real actual
    conflict only existed solely inside of one function, that being
    be_map_pci_bars().

    So instead, to resolve this, I checked out be_main.c from the top
    of net-next, then I applied the be_main.c changes from 'net' since
    the last time I merged. And this worked beautifully.

    The inet_diag.c and sysctl_net_core.c conflicts were simple
    overlapping changes, and were easily to resolve.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • With the current netback, the bandwidth limiter's parameters are only
    settable during vif setup time. This patch register a watch on them, and
    thus makes them runtime changeable.

    When the watch fires, the timer is reset. The timer's mutex is used for
    fencing the change.

    Cc: Anthony Liguori
    Signed-off-by: Imre Palik
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Palik, Imre
     

12 Mar, 2015

1 commit

  • This fixes a performance regression introduced by
    7fbb9d8415d4a51cf542e87cf3a717a9f7e6aedc (xen-netback: release pending
    index before pushing Tx responses)

    Moving the notify outside of the spin locks means it can be delayed a
    long time (if the dealloc thread is descheduled or there is an
    interrupt or softirq).

    Signed-off-by: David Vrabel
    Reviewed-by: Zoltan Kiss
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

10 Mar, 2015

1 commit


06 Mar, 2015

3 commits

  • When handling a from-guest frag list, xenvif_handle_frag_list()
    replaces the frags before calling the destructor to clean up the
    original (foreign) frags. Whilst this is safe (the destructor doesn't
    actually use the frags), it looks odd.

    Reorder the function to be less confusing.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     
  • Every time a VIF is destroyed up to 256 pages may be leaked if packets
    with more than MAX_SKB_FRAGS frags were transmitted from the guest.
    Even worse, if another user of ballooned pages allocated one of these
    ballooned pages it would not handle the unexpectedly >1 page count
    (e.g., gntdev would deadlock when unmapping a grant because the page
    count would never reach 1).

    When handling a from-guest skb with a frag list, unref the frags
    before releasing them so they are freed correctly when the VIF is
    destroyed.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     
  • Use correct pointer arithmetic to get the pointer to each stat.

    Signed-off-by: David Vrabel
    Signed-off-by: David S. Miller

    David Vrabel
     

04 Mar, 2015

2 commits


25 Feb, 2015

1 commit

  • If the pending indexes are released /after/ pushing the Tx response
    then a stale pending index may be used if a new Tx request is
    immediately pushed by the frontend. The may cause various WARNINGs or
    BUGs if the stale pending index is actually still in use.

    Fix this by releasing the pending index before pushing the Tx
    response.

    The full barrier for the pending ring update is not required since the
    the Tx response push already has a suitable write barrier.

    Signed-off-by: David Vrabel
    Reviewed-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

11 Feb, 2015

2 commits

  • Pull networking updates from David Miller:

    1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
    wrong, and this pull actually adds an extra commit on top of the
    branch I'm pulling to fix that up, so that the pre-merge state is
    ok. - Linus ]

    2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation. From Alexander Duyck.

    3) Remove sock_iocb altogether, from CHristoph Hellwig.

    4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

    5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

    6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

    7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
    Florian Westphal.

    8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

    9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

    10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

    11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

    12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms. From Neal Cardwell.

    13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

    14) Support xmit_more in be2net, from Sathya Perla.

    15) Group Policy extensions for vxlan, from Thomas Graf.

    16) Remove Checksum Offload support for vxlan, from Tom Herbert.

    17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
    Vlad Yasevich.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
    crypto: fix af_alg_make_sg() conversion to iov_iter
    ipv4: Namespecify TCP PMTU mechanism
    i40e: Fix for stats init function call in Rx setup
    tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
    openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
    ipv6: Make __ipv6_select_ident static
    ipv6: Fix fragment id assignment on LE arches.
    bridge: Fix inability to add non-vlan fdb entry
    net: Mellanox: Delete unnecessary checks before the function call "vunmap"
    cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
    ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
    net: dsa: Remove redundant phy_attach()
    IB/mlx4: Reset flow support for IB kernel ULPs
    IB/mlx4: Always use the correct port for mirrored multicast attachments
    net/bonding: Fix potential bad memory access during bonding events
    tipc: remove tipc_snprintf
    tipc: nl compat add noop and remove legacy nl framework
    tipc: convert legacy nl stats show to nl compat
    tipc: convert legacy nl net id get to nl compat
    tipc: convert legacy nl net id set to nl compat
    ...

    Linus Torvalds
     
  • Pull xen features and fixes from David Vrabel:

    - Reworked handling for foreign (grant mapped) pages to simplify the
    code, enable a number of additional use cases and fix a number of
    long-standing bugs.

    - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is
    stable).

    - Assorted other cleanup and minor bug fixes.

    * tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (25 commits)
    xen/manage: Fix USB interaction issues when resuming
    xenbus: Add proper handling of XS_ERROR from Xenbus for transactions.
    xen/gntdev: provide find_special_page VMA operation
    xen/gntdev: mark userspace PTEs as special on x86 PV guests
    xen-blkback: safely unmap grants in case they are still in use
    xen/gntdev: safely unmap grants in case they are still in use
    xen/gntdev: convert priv->lock to a mutex
    xen/grant-table: add a mechanism to safely unmap pages that are in use
    xen-netback: use foreign page information from the pages themselves
    xen: mark grant mapped pages as foreign
    xen/grant-table: add helpers for allocating pages
    x86/xen: require ballooned pages for grant maps
    xen: remove scratch frames for ballooned pages and m2p override
    xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs()
    mm: add 'foreign' alias for the 'pinned' page flag
    mm: provide a find_special_page vma operation
    x86/xen: cleanup arch/x86/xen/mmu.c
    x86/xen: add some __init annotations in arch/x86/xen/mmu.c
    x86/xen: add some __init and static annotations in arch/x86/xen/setup.c
    x86/xen: use correct types for addresses in arch/x86/xen/setup.c
    ...

    Linus Torvalds
     

06 Feb, 2015

2 commits

  • this patch fixes following sparse warning:

    interface.c:83:5: warning: symbol 'xenvif_poll' was not declared. Should it be static?

    Signed-off-by: Lad, Prabhakar
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Lad, Prabhakar
     
  • Conflicts:
    drivers/net/vxlan.c
    drivers/vhost/net.c
    include/linux/if_vlan.h
    net/core/dev.c

    The net/core/dev.c conflict was the overlap of one commit marking an
    existing function static whilst another was adding a new function.

    In the include/linux/if_vlan.h case, the type used for a local
    variable was changed in 'net', whereas the function got rewritten
    to fix a stacked vlan bug in 'net-next'.

    In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
    overlapped with an endainness fix for VHOST 1.0 in 'net'.

    In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
    in 'net-next' whereas in 'net' there was a bug fix to pass in the
    correct network namespace pointer in calls to this function.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Feb, 2015

1 commit

  • After commit e9d8b2c2968499c1f96563e6522c56958d5a1d0d (xen-netback:
    disable rogue vif in kthread context), a fatal (protocol) error would
    leave the guest Rx thread spinning, wasting CPU time. Commit
    ecf08d2dbb96d5a4b4bcc53a39e8d29cc8fef02e (xen-netback: reintroduce
    guest Rx stall detection) made this even worse by removing a
    cond_resched() from this path.

    Since a fatal error is non-recoverable, just allow the guest Rx thread
    to exit. This requires taking additional refs to the task so the
    thread exiting early is handled safely.

    Signed-off-by: David Vrabel
    Reported-by: Julien Grall
    Tested-by: Julien Grall
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

28 Jan, 2015

3 commits

  • Use the foreign page flag in netback to get the domid and grant ref
    needed for the grant copy. This signficiantly simplifies the netback
    code and makes netback work with foreign pages from other backends
    (e.g., blkback).

    This allows blkback to use iSCSI disks provided by domUs running on
    the same host.

    Signed-off-by: Jennifer Herbert
    Acked-by: Ian Campbell
    Acked-by: David S. Miller
    Signed-off-by: David Vrabel

    Jennifer Herbert
     
  • Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages
    suitable to for granted maps.

    Signed-off-by: David Vrabel
    Reviewed-by: Stefano Stabellini

    David Vrabel
     
  • Ballooned pages are always used for grant maps which means the
    original frame does not need to be saved in page->index nor restored
    after the grant unmap.

    This allows the workaround in netback for the conflicting use of the
    (unionized) page->index and page->pfmemalloc to be removed.

    Signed-off-by: Jennifer Herbert
    Reviewed-by: Stefano Stabellini
    Signed-off-by: David Vrabel

    Jennifer Herbert
     

24 Jan, 2015

1 commit

  • Always fully coalesce guest Rx packets into the minimum number of ring
    slots. Reducing the number of slots per packet has significant
    performance benefits when receiving off-host traffic.

    Results from XenServer's performance benchmarks:

    Baseline Full coalesce
    Interhost VM receive 7.2 Gb/s 11 Gb/s
    Interhost aggregate 24 Gb/s 24 Gb/s
    Intrahost single stream 14 Gb/s 14 Gb/s
    Intrahost aggregate 34 Gb/s 34 Gb/s

    However, this can increase the number of grant ops per packet which
    decreases performance of backend (dom0) to VM traffic (by ~10%)
    /unless/ grant copy has been optimized for adjacent ops with the same
    source or destination (see "grant-table: defer releasing pages
    acquired in a grant copy"[1] expected in Xen 4.6).

    [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html

    Signed-off-by: David Vrabel
    Acked-by: Ian Campbell
    Signed-off-by: David S. Miller

    David Vrabel
     

07 Jan, 2015

1 commit


19 Dec, 2014

1 commit

  • Commit bc96f648df1bbc2729abbb84513cf4f64273a1f1 (xen-netback: make
    feature-rx-notify mandatory) incorrectly assumed that there were no
    frontends in use that did not support this feature. But the frontend
    driver in MiniOS does not and since this is used by (qemu) stubdoms,
    these stopped working.

    Netback sort of works as-is in this mode except:

    - If there are no Rx requests and the internal Rx queue fills, only
    the drain timeout will wake the thread. The default drain timeout
    of 10 s would give unacceptable pauses.

    - If an Rx stall was detected and the internal Rx queue is drained,
    then the Rx thread would never wake.

    Handle these two cases (when feature-rx-notify is disabled) by:

    - Reducing the drain timeout to 30 ms.

    - Disabling Rx stall detection.

    Reported-by: John
    Tested-by: John
    Signed-off-by: David Vrabel
    Reviewed-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

11 Dec, 2014

1 commit


10 Dec, 2014

1 commit

  • When xenvif_alloc() fails, it returns a non-NULL error indicator. To
    avoid eventual races, we shouldn't store that into struct backend_info
    as readers of it only check for NULL.

    Signed-off-by: Jan Beulich
    Acked-by: Ian Campbell
    Signed-off-by: David S. Miller

    Jan Beulich
     

30 Nov, 2014

1 commit


25 Nov, 2014

1 commit


07 Nov, 2014

1 commit

  • Unconditionally pulling 128 bytes into the linear area is not required
    for:

    - security: Every protocol demux starts with pskb_may_pull() to pull
    frag data into the linear area, if necessary, before looking at
    headers.

    - performance: Netback has already grant copied up-to 128 bytes from
    the first slot of a packet into the linear area. The first slot
    normally contain all the IPv4/IPv6 and TCP/UDP headers.

    The unconditional pull would often copy frag data unnecessarily. This
    is a performance problem when running on a version of Xen where grant
    unmap avoids TLB flushes for pages which are not accessed. TLB
    flushes can now be avoided for > 99% of unmaps (it was 0% before).

    Grant unmap TLB flush avoidance will be available in a future version
    of Xen (probably 4.6).

    Signed-off-by: Malcolm Crossley
    Signed-off-by: David Vrabel
    Acked-by: Ian Campbell
    Signed-off-by: David S. Miller

    Malcolm Crossley
     

02 Nov, 2014

1 commit


30 Oct, 2014

2 commits

  • This flag is unnecessary, it came from some old code.

    Suggested-by: Eric Dumazet
    Signed-off-by: Zoltan Kiss
    Signed-off-by: David Vrabel
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Zoltan Kiss
     
  • Otherwise the interrupt handler still calls napi_complete. Although it
    won't schedule NAPI again as either NAPI_STATE_DISABLE or
    NAPI_STATE_SCHED is set, it is just unnecessary, and it makes more
    sense to do this way.

    Signed-off-by: Zoltan Kiss
    Signed-off-by: David Vrabel
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Zoltan Kiss
     

26 Oct, 2014

3 commits

  • If a frontend not receiving packets it is useful to detect this and
    turn off the carrier so packets are dropped early instead of being
    queued and drained when they expire.

    A to-guest queue is stalled if it doesn't have enough free slots for a
    an extended period of time (default 60 s).

    If at least one queue is stalled, the carrier is turned off (in the
    expectation that the other queues will soon stall as well). The
    carrier is only turned on once all queues are ready.

    When the frontend connects, all the queues start in the stalled state
    and only become ready once the frontend queues enough Rx requests.

    Signed-off-by: David Vrabel
    Reviewed-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     
  • Netback needs to discard old to-guest skb's (guest Rx queue drain) and
    it needs detect guest Rx stalls (to disable the carrier so packets are
    discarded earlier), but the current implementation is very broken.

    1. The check in hard_start_xmit of the slot availability did not
    consider the number of packets that were already in the guest Rx
    queue. This could allow the queue to grow without bound.

    The guest stops consuming packets and the ring was allowed to fill
    leaving S slot free. Netback queues a packet requiring more than S
    slots (ensuring that the ring stays with S slots free). Netback
    queue indefinately packets provided that then require S or fewer
    slots.

    2. The Rx stall detection is not triggered in this case since the
    (host) Tx queue is not stopped.

    3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback
    will consider this an Rx purge event which may result in it taking
    the carrier down unnecessarily. It also considers a queue with
    only 1 slot free as unstalled (even though the next packet might
    not fit in this).

    The internal guest Rx queue is limited by a byte length (to 512 Kib,
    enough for half the ring). The (host) Tx queue is stopped and started
    based on this limit. This sets an upper bound on the amount of memory
    used by packets on the internal queue.

    This allows the estimatation of the number of slots for an skb to be
    removed (it wasn't a very good estimate anyway). Instead, the guest
    Rx thread just waits for enough free slots for a maximum sized packet.

    skbs queued on the internal queue have an 'expires' time (set to the
    current time plus the drain timeout). The guest Rx thread will detect
    when the skb at the head of the queue has expired and discard expired
    skbs. This sets a clear upper bound on the length of time an skb can
    be queued for. For a guest being destroyed the maximum time needed to
    wait for all the packets it sent to be dropped is still the drain
    timeout (10 s) since it will not be sending new packets.

    Rx stall detection is reintroduced in a later commit.

    Signed-off-by: David Vrabel
    Reviewed-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     
  • Frontends that do not provide feature-rx-notify may stall because
    netback depends on the notification from frontend to wake the guest Rx
    thread (even if can_queue is false).

    This could be fixed but feature-rx-notify was introduced in 2006 and I
    am not aware of any frontends that do not implement this.

    Signed-off-by: David Vrabel
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    David Vrabel
     

06 Oct, 2014

1 commit


26 Aug, 2014

1 commit

  • Interrupt is enabled when bind_interdomain_evtchn_to_irqhandler returns.
    If there's interrupt pending interrupt handler is invoked.

    NAPI needs to be initialised before binding interrupt otherwise the
    interrupt handler will try to scheduling a NAPI instance that is not
    initialised yet, resulting in kernel OOPS.

    This fixes a regression introduced in ea2c5e13 ("xen-netback: move NAPI
    add/remove calls").

    Ideally function calls to create kthreads should also be moved before
    binding but I intent to fix this regression with minimal changes and
    refactor the code with another patch.

    Reported-by: Thomas Leonard
    Signed-off-by: Wei Liu
    Cc: Ian Campbell
    Signed-off-by: David S. Miller

    Wei Liu
     

14 Aug, 2014

4 commits

  • The original implementation relies on a loop to check if all inflight
    packets are freed. Now we have proper reference counting, there's no
    need to use loop anymore.

    Signed-off-by: Wei Liu
    Cc: Ian Campbell
    Cc: Zoltan Kiss
    Signed-off-by: David S. Miller

    Wei Liu
     
  • Reference count the number of packets in host stack, so that we don't
    stop the deallocation thread too early. If not, we can end up with
    xenvif_free permanently waiting for deallocation thread to unmap grefs.

    Reported-by: Thomas Leonard
    Signed-off-by: Wei Liu
    Cc: Ian Campbell
    Cc: Zoltan Kiss
    Signed-off-by: David S. Miller

    Wei Liu
     
  • Originally netif_napi_add was in xenvif_init_queue and netif_napi_del
    was in xenvif_deinit_queue, while kthreads were handled in
    xenvif_connect and xenvif_disconnect. Move netif_napi_add and
    netif_napi_del to xenvif_connect and xenvif_disconnect so that they
    reside together with kthread operations.

    Signed-off-by: Wei Liu
    Cc: Ian Campbell
    Cc: Zoltan Kiss
    Signed-off-by: David S. Miller

    Wei Liu
     
  • The original code is bogus. The function gets called in a loop which
    leaks entries created in previous rounds.

    Signed-off-by: Wei Liu
    Cc: Zoltan Kiss
    Cc: Ian Campbell
    Signed-off-by: David S. Miller

    Wei Liu