22 Jul, 2012

2 commits

  • The vhost work queue allows processing to be done in vhost worker thread
    context, which uses the owner process mm. Access to the vring and guest
    memory is typically only possible from vhost worker context so it is
    useful to allow work to be queued directly by users.

    Currently vhost_net only uses the poll wrappers which do not expose the
    work queue functions. However, for tcm_vhost (vhost_scsi) it will be
    necessary to queue custom work.

    Signed-off-by: Stefan Hajnoczi
    Cc: Zhi Yong Wu
    Cc: Michael S. Tsirkin
    Cc: Paolo Bonzini
    Signed-off-by: Nicholas Bellinger
    Signed-off-by: Michael S. Tsirkin

    Stefan Hajnoczi
     
  • In order for other vhost devices to use the VHOST_FEATURES bits the
    vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
    constant.

    (Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES)

    Signed-off-by: Stefan Hajnoczi
    Cc: Zhi Yong Wu
    Cc: Michael S. Tsirkin
    Cc: Paolo Bonzini
    Cc: Asias He
    Signed-off-by: Nicholas A. Bellinger
    Signed-off-by: Michael S. Tsirkin

    Stefan Hajnoczi
     

27 Jun, 2012

1 commit

  • On some architectures address spaces are set up in a way that this is
    not necessary to work properly but on some others (like s390) it is.
    Make sure we operate on the user address space to allow copy_xxx_user()
    from the vhost_worker() thread by setting it explicitly before calling
    use_mm() and revert it after unuse_mm().

    Signed-off-by: Jens Freimann
    Signed-off-by: Christian Borntraeger
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Jens Freimann
     

17 May, 2012

1 commit


12 May, 2012

1 commit

  • Take vlan header length into account, when vlan id is stored as
    vlan_tci. Otherwise tagged packets coming from macvtap will be
    truncated.

    Signed-off-by: Basil Gor
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Basil Gor
     

02 May, 2012

4 commits

  • We add used and signal guest in worker thread but did not poll the virtqueue
    during the zero copy callback. This may lead the missing of adding and
    signalling during zerocopy. Solve this by polling the virtqueue and let it
    wakeup the worker during callback.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • When a packet were fully copied in zerocopy, we don't wait for the DMA done to
    mark the done flag, so after the packet were passed to lower device, we need to
    add used and signal guest immediately.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • Currently, we restart tx polling unconditionally when sendmsg()
    fails. This would cause unnecessary wakeups of vhost wokers and waste
    cpu utlization when evil userspace(guest driver) is able to hit EFAULT or
    EINVAL.

    The polling is only needed when the socket send buffer were exceeded or not
    enough memory. So fix this by restarting polling only when sendmsg() returns
    EAGAIN/ENOBUFS.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • When we want to disable vhost_net backend while there's a tx work, a possible
    NULL pointer defernece may happen we we try to deference the vq->bufs after
    vhost_net_set_backend() assign a NULL to it.

    As suggested by Michael, fix this by checking the vq->bufs instead of
    vhost_sock_zcopy().

    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     

23 Apr, 2012

1 commit

  • Pull networking fixes from David Miller:

    1) Fix namespace init and cleanup in phonet to fix some oopses, from
    Eric W. Biederman.

    2) Missing kfree_skb() in AF_KEY, from Julia Lawall.

    3) Refcount leak and source address handling fix in l2tp from James
    Chapman.

    4) Memory leak fix in CAIF from Tomasz Gregorek.

    5) When routes are cloned from ipv6 addrconf routes, we don't process
    expirations properly. Fix from Gao Feng.

    6) Fix panic on DMA errors in atl1 driver, from Tony Zelenoff.

    7) Only enable interrupts in 8139cp driver after we've registered the
    IRQ handler. From Jason Wang.

    8) Fix too many reads of KS_CIDER register in ks8851 during probe,
    fixing crashes on spurious interrupts. From Matt Renzelmann.

    9) Missing include in ath5k driver and missing iounmap on probe
    failure, from Jonathan Bither.

    10) Fix RX packet handling in smsc911x driver, from Will Deacon.

    11) Fix ixgbe WoL on fiber by leaving the laser on during shutdown.

    12) ks8851 needs MAX_RECV_FRAMES increased otherwise the internal MAC
    buffers are easily overflown. Fix from Davide Cimingahi.

    13) Fix memory leaks in peak_usb CAN driver, from Jesper Juhl.

    14) gred packet scheduler can dump in WRED more when doing a netlink
    dump. Fix from David Ward.

    15) Fix MTU in USB smsc75xx driver, from Stephane Fillod.

    16) Dummy device needs ->ndo_uninit handler to properly handle
    ->ndo_init failures. From Hiroaki SHIMODA.

    17) Fix TX fragmentation in ath9k driver, from Sujith Manoharan.

    18) Missing RTNL lock in ixgbe PM resume, from Benjamin Poirier.

    19) Missing iounmap in farsync WAN driver, from Julia Lawall.

    20) With LRO/GRO, tcp_grow_window() is easily tricked into not growing
    the receive window properly, and this hurts performance. Fix from
    Eric Dumazet.

    21) Network namespace init failure can leak net_generic data, fix from
    Julian Anastasov.

    22) Fix skb_over_panic due to mis-accounting in TCP for partially ACK'd
    SKBs. From Eric Dumazet.

    23) New IDs for qmi_wwan driver, from Bjørn Mork.

    24) Fix races in ax25_exit(), from Eric W. Biederman.

    25) IPV6 TCP doesn't handle TCP_MAXSEG socket option properly, copy over
    logic from the IPV4 side. From Neal Cardwell.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
    tcp: fix TCP_MAXSEG for established IPv6 passive sockets
    drivers/net: Do not free an IRQ if its request failed
    drop_monitor: allow more events per second
    ks8851: Fix request_irq/free_irq mismatch
    net/hyperv: Adding cancellation to ensure rndis filter is closed
    ks8851: Fix mutex deadlock in ks8851_net_stop()
    net ax25: Reorder ax25_exit to remove races.
    icplus: fix interrupt for IC+ 101A/G and 1001LF
    net: qmi_wwan: support Sierra Wireless MC77xx devices in QMI mode
    bnx2x: off by one in bnx2x_ets_e3b0_sp_pri_to_cos_set()
    ksz884x: don't copy too much in netdev_set_mac_address()
    tcp: fix retransmit of partially acked frames
    netns: do not leak net_generic data on failed init
    net/sock.h: fix sk_peek_off kernel-doc warning
    tcp: fix tcp_grow_window() for large incoming frames
    drivers/net/wan/farsync.c: add missing iounmap
    davinci_mdio: Fix MDIO timeout check
    ipv6: clean up rt6_clean_expires
    ipv6: fix rt6_update_expires
    arcnet: rimi: Fix device name in debug output
    ...

    Linus Torvalds
     

14 Apr, 2012

1 commit

  • The skb struct ubuf_info callback gets passed struct ubuf_info
    itself, not the arg value as the field name and the function signature
    seem to imply. Rename the arg field to ctx to match usage,
    add documentation and change the callback argument type
    to make usage clear and to have compiler check correctness.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

12 Apr, 2012

1 commit


24 Mar, 2012

1 commit


20 Mar, 2012

1 commit


28 Feb, 2012

2 commits

  • We shouldn't hold any locks on release path. Pass a flag to
    vhost_dev_cleanup to use the lockdep info correctly.

    Signed-off-by: Michael S. Tsirkin
    Tested-by: Sasha Levin

    Michael S. Tsirkin
     
  • This is a tiny, but important, patch to vhost.

    Vhost's worker thread only called schedule() when it had no work to do, and
    it wanted to go to sleep. But if there's always work to do, e.g., the guest
    is running a network-intensive program like netperf with small message sizes,
    schedule() was *never* called. This had several negative implications (on
    non-preemptive kernels):

    1. Passing time was not properly accounted to the "vhost" process (ps and
    top would wrongly show it using zero CPU time).

    2. Sometimes error messages about RCU timeouts would be printed, if the
    core running the vhost thread didn't schedule() for a very long time.

    3. Worst of all, a vhost thread would "hog" the core. If several vhost
    threads need to share the same core, typically one would get most of the
    CPU time (and its associated guest most of the performance), while the
    others hardly get any work done.

    The trivial solution is to add

    if (need_resched())
    schedule();

    After doing every piece of work. This will not do the heavy schedule() all
    the time, just when the timer interrupt decided a reschedule is warranted
    (so need_resched returns true).

    Thanks to Abel Gordon for this patch.

    Signed-off-by: Nadav Har'El
    Signed-off-by: Michael S. Tsirkin

    Nadav Har'El
     

14 Jan, 2012

1 commit

  • By adding some module aliases, programs (or users) won't have to explicitly
    call modprobe. Vhost-net will always be available if built into the kernel.
    It does require assigning a permanent minor number for depmod to work.

    Also:
    - use C99 style initialization.
    - add missing entry in documentation for loop-control

    Signed-off-by: Stephen Hemminger
    Acked-By: Kay Sievers
    Signed-off-by: David S. Miller

    stephen hemminger
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

21 Jul, 2011

2 commits


19 Jul, 2011

5 commits

  • As we now only update used ring after enabling
    the backend, we can write flags with __put_user:
    as that's done on data path, it matters.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • Fix get/put refcount imbalance with zero copy,
    which caused qemu to hang forever on guest driver unload.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • We need to log writes when updating used flags and avail event
    fields. Otherwise the guest may see a stale value after migration and
    miss notifying the host.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • Move the used ring initialization after backend was set. This
    makes it possible to disable the backend and tweak the used ring,
    then restart. This will also make it possible to log the used ring
    write correctly.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • >From: Shirley Ma

    This adds experimental zero copy support in vhost-net,
    disabled by default. To enable, set
    experimental_zcopytx module option to 1.

    This patch maintains the outstanding userspace buffers in the
    sequence it is delivered to vhost. The outstanding userspace buffers
    will be marked as done once the lower device buffers DMA has finished.
    This is monitored through last reference of kfree_skb callback. Two
    buffer indices are used for this purpose.

    The vhost-net device passes the userspace buffers info to lower device
    skb through message control. DMA done status check and guest
    notification are handled by handle_tx: in the worst case is all buffers
    in the vq are in pending/done status, so we need to notify guest to
    release DMA done buffers first before we get any new buffers from the
    vq.

    One known problem is that if the guest stops submitting
    buffers, buffers might never get used until some
    further action, e.g. device reset. This does not
    seem to affect linux guests.

    Signed-off-by: Shirley
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

30 May, 2011

1 commit


07 May, 2011

1 commit

  • - Documentation/kvm/ to Documentation/virtual/kvm
    - Documentation/uml/ to Documentation/virtual/uml
    - Documentation/lguest/ to Documentation/virtual/lguest
    throughout the kernel source tree.

    Signed-off-by: Rob Landley
    Signed-off-by: Randy Dunlap

    Rob Landley
     

14 Mar, 2011

2 commits


13 Mar, 2011

2 commits

  • Codes duplication were found between the handling of mergeable and big
    buffers, so this patch tries to unify them. This could be easily done
    by adding a quota to the get_rx_bufs() which is used to limit the
    number of buffers it returns (for mergeable buffer, the quota is
    simply UIO_MAXIOV, for big buffers, the quota is just 1), and then the
    previous handle_rx_mergeable() could be resued also for big buffers.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • No need to check the support of mergeable buffer inside the recevie
    loop as the whole handle_rx()_xx is in the read critical region. So
    this patch move it ahead of the receiving loop.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     

09 Mar, 2011

2 commits


01 Feb, 2011

1 commit

  • When built with rcu checks enabled, vhost triggers
    bogus warnings as vhost features are read without
    dev->mutex sometimes, and private pointer is read
    with our kind of rcu where work serves as a
    read side critical section.

    Fixing it properly is not trivial.
    Disable the warnings by stubbing out the checks for now.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

10 Jan, 2011

1 commit


15 Dec, 2010

1 commit


09 Dec, 2010

4 commits