05 Aug, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
    phy/marvell: add 88ec048 support
    igb: Program MDICNFG register prior to PHY init
    e1000e: correct MAC-PHY interconnect register offset for 82579
    hso: Add new product ID
    can: Add driver for esd CAN-USB/2 device
    l2tp: fix export of header file for userspace
    can-raw: Fix skb_orphan_try handling
    Revert "net: remove zap_completion_queue"
    net: cleanup inclusion
    phy/marvell: add 88e1121 interface mode support
    u32: negative offset fix
    net: Fix a typo from "dev" to "ndev"
    igb: Use irq_synchronize per vector when using MSI-X
    ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
    e1000e: Fix irq_synchronize in MSI-X case
    e1000e: register pm_qos request on hardware activation
    ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
    net: Add getsockopt support for TCP thin-streams
    cxgb4: update driver version
    cxgb4: add new PCI IDs
    ...

    Manually fix up conflicts in:
    - drivers/net/e1000e/netdev.c: due to pm_qos registration
    infrastructure changes
    - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
    and cleaning up the IDs
    - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
    conflict (registration change vs marking it static)

    Linus Torvalds
     

28 Jul, 2010

2 commits

  • This adds support for mergeable buffers in vhost-net: this is needed
    for older guests without indirect buffer support, as well
    as for zero copy with some devices.

    Includes changes by Michael S. Tsirkin to make the
    patch as low risk as possible (i.e., close to no changes
    when feature is disabled).

    Signed-off-by: David Stevens
    Signed-off-by: Michael S. Tsirkin

    David Stevens
     
  • Replace vhost_workqueue with per-vhost kthread. Other than callback
    argument change from struct work_struct * to struct vhost_work *,
    there's no visible change to vhost_poll_*() interface.

    This conversion is to make each vhost use a dedicated kthread so that
    resource control via cgroup can be applied.

    Partially based on Sridhar Samudrala's patch.

    * Updated to use sub structure vhost_work instead of directly using
    vhost_poll at Michael's suggestion.

    * Added flusher wake_up() optimization at Michael's suggestion.

    Changes by MST:
    * Converted atomics/barrier use to a spinlock.
    * Create thread on SET_OWNER
    * Fix flushing

    Signed-off-by: Tejun Heo
    Signed-off-by: Michael S. Tsirkin
    Cc: Sridhar Samudrala

    Tejun Heo
     

22 Jul, 2010

1 commit


21 Jul, 2010

2 commits

  • Conflicts:
    drivers/vhost/net.c
    net/bridge/br_device.c

    Fix merge conflict in drivers/vhost/net.c with guidance from
    Stephen Rothwell.

    Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d
    since net-next-2.6 has fixes that make bridge netpoll work properly thus
    we don't need it disabled.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
    bridge: Partially disable netpoll support
    tcp: fix crash in tcp_xmit_retransmit_queue
    IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input())
    ibmveth: lost IRQ while closing/opening device leads to service loss
    rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()
    vhost: avoid pr_err on condition guest can trigger
    ipmr: Don't leak memory if fib lookup fails.
    vhost-net: avoid flush under lock
    net: fix problem in reading sock TX queue
    net/core: neighbour update Oops
    net: skb_tx_hash() fix relative to skb_orphan_try()
    rfs: call sock_rps_record_flow() in tcp_splice_read()
    xfrm: do not assume that template resolving always returns xfrms
    hostap_pci: set dev->base_addr during probe
    axnet_cs: use spin_lock_irqsave in ax_interrupt
    dsa: Fix Kconfig dependencies.
    act_nat: not all of the ICMP packets need an IP header payload
    r8169: incorrect identifier for a 8168dp
    Phonet: fix skb leak in pipe endpoint accept()
    Bluetooth: Update sec_level/auth_type for already existing connections
    ...

    Linus Torvalds
     

16 Jul, 2010

1 commit


15 Jul, 2010

1 commit

  • We flush under vq mutex when changing backends.
    This creates a deadlock as workqueue being flushed
    needs this lock as well.

    https://bugzilla.redhat.com/show_bug.cgi?id=612421

    Drop the vq mutex before flush: we have the device mutex
    which is sufficient to prevent another ioctl from touching
    the vq.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

08 Jul, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (35 commits)
    NET: SB1250: Initialize .owner
    vxge: show startup message with KERN_INFO
    ll_temac: Fix missing iounmaps
    bridge: Clear IPCB before possible entry into IP stack
    bridge br_multicast: BUG: unable to handle kernel NULL pointer dereference
    net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined
    net/ne: fix memory leak in ne_drv_probe()
    xfrm: fix xfrm by MARK logic
    virtio_net: fix oom handling on tx
    virtio_net: do not reschedule rx refill forever
    s2io: resolve statistics issues
    linux/net.h: fix kernel-doc warnings
    net: decreasing real_num_tx_queues needs to flush qdisc
    sched: qdisc_reset_all_tx is calling qdisc_reset without qdisc_lock
    qlge: fix a eeh handler to not add a pending timer
    qlge: Replacing add_timer() to mod_timer()
    usbnet: Set parent device early for netdev_printk()
    net: Revert "rndis_host: Poll status channel before control channel"
    netfilter: ip6t_REJECT: fix a dst leak in ipv6 REJECT
    drivers: bluetooth: bluecard_cs.c: Fixed include error, changed to linux/io.h
    ...

    Linus Torvalds
     
  • David S. Miller
     

02 Jul, 2010

1 commit


27 Jun, 2010

1 commit

  • When ring parsing fails, we currently handle this
    as ring empty condition. This means that we enable
    kicks and recheck ring empty: if this not empty,
    we re-start polling which of course will fail again.

    Instead, let's return a negative error code and stop polling.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

09 Jun, 2010

1 commit

  • 10, 233 is allocated officially to /dev/kmview which is shipping in
    Ubuntu and Debian distributions. vhost_net seem to have borrowed it
    without making a proper request and this causes regressions in the other
    distributions.

    vhost_net can use a dynamic minor so use that instead. Also update the
    file with a comment to try and avoid future misunderstandings.

    cc: stable@kernel.org
    Signed-off-by: Alan Cox
    [ We should have caught this before 2.6.34 got released. - Linus ]
    Signed-off-by: Linus Torvalds

    Alan Cox
     

02 Jun, 2010

1 commit


27 May, 2010

3 commits


15 Apr, 2010

1 commit


14 Apr, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

17 Mar, 2010

1 commit


07 Mar, 2010

1 commit


01 Mar, 2010

1 commit

  • guest to remote communication with vhost net sometimes stops until
    guest driver is restarted. This happens when we get guest kick precisely
    when the backend send queue is full, as a result handle_tx() returns without
    polling backend. This patch fixes this by restarting tx poll on this condition.

    Signed-off-by: Sridhar Samudrala
    Signed-off-by: Michael S. Tsirkin
    Tested-by: Tom Lendacky

    Sridhar Samudrala
     

19 Feb, 2010

1 commit

  • This adds support for passing a macvtap file descriptor into
    vhost-net, much like we already do for tun/tap.

    Most of the new code is taken from the respective patch
    in the tun driver and may get consolidated in the future.

    Signed-off-by: Arnd Bergmann
    Acked-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

15 Jan, 2010

1 commit

  • What it is: vhost net is a character device that can be used to reduce
    the number of system calls involved in virtio networking.
    Existing virtio net code is used in the guest without modification.

    There's similarity with vringfd, with some differences and reduced scope
    - uses eventfd for signalling
    - structures can be moved around in memory at any time (good for
    migration, bug work-arounds in userspace)
    - write logging is supported (good for migration)
    - support memory table and not just an offset (needed for kvm)

    common virtio related code has been put in a separate file vhost.c and
    can be made into a separate module if/when more backends appear. I used
    Rusty's lguest.c as the source for developing this part : this supplied
    me with witty comments I wouldn't be able to write myself.

    What it is not: vhost net is not a bus, and not a generic new system
    call. No assumptions are made on how guest performs hypercalls.
    Userspace hypervisors are supported as well as kvm.

    How it works: Basically, we connect virtio frontend (configured by
    userspace) to a backend. The backend could be a network device, or a tap
    device. Backend is also configured by userspace, including vlan/mac
    etc.

    Status: This works for me, and I haven't see any crashes.
    Compared to userspace, people reported improved latency (as I save up to
    4 system calls per packet), as well as better bandwidth and CPU
    utilization.

    Features that I plan to look at in the future:
    - mergeable buffers
    - zero copy
    - scalability tuning: figure out the best threading model to use

    Note on RCU usage (this is also documented in vhost.h, near
    private_pointer which is the value protected by this variant of RCU):
    what is happening is that the rcu_dereference() is being used in a
    workqueue item. The role of rcu_read_lock() is taken on by the start of
    execution of the workqueue item, of rcu_read_unlock() by the end of
    execution of the workqueue item, and of synchronize_rcu() by
    flush_workqueue()/flush_work(). In the future we might need to apply
    some gcc attribute or sparse annotation to the function passed to
    INIT_WORK(). Paul's ack below is for this RCU usage.

    (Includes fixes by Alan Cox ,
    David L Stevens ,
    Chris Wright )

    Acked-by: Rusty Russell
    Acked-by: Arnd Bergmann
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin