21 Jul, 2011

2 commits


19 Jul, 2011

2 commits

  • Move the used ring initialization after backend was set. This
    makes it possible to disable the backend and tweak the used ring,
    then restart. This will also make it possible to log the used ring
    write correctly.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • >From: Shirley Ma

    This adds experimental zero copy support in vhost-net,
    disabled by default. To enable, set
    experimental_zcopytx module option to 1.

    This patch maintains the outstanding userspace buffers in the
    sequence it is delivered to vhost. The outstanding userspace buffers
    will be marked as done once the lower device buffers DMA has finished.
    This is monitored through last reference of kfree_skb callback. Two
    buffer indices are used for this purpose.

    The vhost-net device passes the userspace buffers info to lower device
    skb through message control. DMA done status check and guest
    notification are handled by handle_tx: in the worst case is all buffers
    in the vq are in pending/done status, so we need to notify guest to
    release DMA done buffers first before we get any new buffers from the
    vq.

    One known problem is that if the guest stops submitting
    buffers, buffers might never get used until some
    further action, e.g. device reset. This does not
    seem to affect linux guests.

    Signed-off-by: Shirley
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

30 May, 2011

1 commit


14 Mar, 2011

2 commits


13 Mar, 2011

2 commits

  • Codes duplication were found between the handling of mergeable and big
    buffers, so this patch tries to unify them. This could be easily done
    by adding a quota to the get_rx_bufs() which is used to limit the
    number of buffers it returns (for mergeable buffer, the quota is
    simply UIO_MAXIOV, for big buffers, the quota is just 1), and then the
    previous handle_rx_mergeable() could be resued also for big buffers.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     
  • No need to check the support of mergeable buffer inside the recevie
    loop as the whole handle_rx()_xx is in the read critical region. So
    this patch move it ahead of the receiving loop.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     

09 Mar, 2011

1 commit


01 Feb, 2011

1 commit

  • When built with rcu checks enabled, vhost triggers
    bogus warnings as vhost features are read without
    dev->mutex sometimes, and private pointer is read
    with our kind of rcu where work serves as a
    read side critical section.

    Fixing it properly is not trivial.
    Disable the warnings by stubbing out the checks for now.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

15 Dec, 2010

1 commit


09 Dec, 2010

1 commit


25 Nov, 2010

1 commit


04 Nov, 2010

1 commit


24 Oct, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
    bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
    vlan: Calling vlan_hwaccel_do_receive() is always valid.
    tproxy: use the interface primary IP address as a default value for --on-ip
    tproxy: added IPv6 support to the socket match
    cxgb3: function namespace cleanup
    tproxy: added IPv6 support to the TPROXY target
    tproxy: added IPv6 socket lookup function to nf_tproxy_core
    be2net: Changes to use only priority codes allowed by f/w
    tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
    tproxy: added tproxy sockopt interface in the IPV6 layer
    tproxy: added udp6_lib_lookup function
    tproxy: added const specifiers to udp lookup functions
    tproxy: split off ipv6 defragmentation to a separate module
    l2tp: small cleanup
    nf_nat: restrict ICMP translation for embedded header
    can: mcp251x: fix generation of error frames
    can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
    can-raw: add msg_flags to distinguish local traffic
    9p: client code cleanup
    rds: make local functions/variables static
    ...

    Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
    drivers/net/wireless/ath/ath9k/debug.c as per David

    Linus Torvalds
     

23 Oct, 2010

1 commit

  • * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    vfs: make no_llseek the default
    vfs: don't use BKL in default_llseek
    llseek: automatically add .llseek fop
    libfs: use generic_file_llseek for simple_attr
    mac80211: disallow seeks in minstrel debug code
    lirc: make chardev nonseekable
    viotape: use noop_llseek
    raw: use explicit llseek file operations
    ibmasmfs: use generic_file_llseek
    spufs: use llseek in all file operations
    arm/omap: use generic_file_llseek in iommu_debug
    lkdtm: use generic_file_llseek in debugfs
    net/wireless: use generic_file_llseek in debugfs
    drm: use noop_llseek

    Linus Torvalds
     

15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

07 Oct, 2010

1 commit


05 Oct, 2010

1 commit

  • Qemu supports up to UIO_MAXIOV s/g so we have to match that because guest
    drivers may rely on this.

    Allocate indirect and log arrays dynamically to avoid using too much contigious
    memory and make the length of hdr array to match the header length since each
    iovec entry has a least one byte.

    Test with copying large files w/ and w/o migration in both linux and windows
    guests.

    Signed-off-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin

    Jason Wang
     

14 Sep, 2010

1 commit


22 Aug, 2010

1 commit


05 Aug, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
    phy/marvell: add 88ec048 support
    igb: Program MDICNFG register prior to PHY init
    e1000e: correct MAC-PHY interconnect register offset for 82579
    hso: Add new product ID
    can: Add driver for esd CAN-USB/2 device
    l2tp: fix export of header file for userspace
    can-raw: Fix skb_orphan_try handling
    Revert "net: remove zap_completion_queue"
    net: cleanup inclusion
    phy/marvell: add 88e1121 interface mode support
    u32: negative offset fix
    net: Fix a typo from "dev" to "ndev"
    igb: Use irq_synchronize per vector when using MSI-X
    ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
    e1000e: Fix irq_synchronize in MSI-X case
    e1000e: register pm_qos request on hardware activation
    ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
    net: Add getsockopt support for TCP thin-streams
    cxgb4: update driver version
    cxgb4: add new PCI IDs
    ...

    Manually fix up conflicts in:
    - drivers/net/e1000e/netdev.c: due to pm_qos registration
    infrastructure changes
    - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
    and cleaning up the IDs
    - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
    conflict (registration change vs marking it static)

    Linus Torvalds
     

28 Jul, 2010

2 commits

  • This adds support for mergeable buffers in vhost-net: this is needed
    for older guests without indirect buffer support, as well
    as for zero copy with some devices.

    Includes changes by Michael S. Tsirkin to make the
    patch as low risk as possible (i.e., close to no changes
    when feature is disabled).

    Signed-off-by: David Stevens
    Signed-off-by: Michael S. Tsirkin

    David Stevens
     
  • Replace vhost_workqueue with per-vhost kthread. Other than callback
    argument change from struct work_struct * to struct vhost_work *,
    there's no visible change to vhost_poll_*() interface.

    This conversion is to make each vhost use a dedicated kthread so that
    resource control via cgroup can be applied.

    Partially based on Sridhar Samudrala's patch.

    * Updated to use sub structure vhost_work instead of directly using
    vhost_poll at Michael's suggestion.

    * Added flusher wake_up() optimization at Michael's suggestion.

    Changes by MST:
    * Converted atomics/barrier use to a spinlock.
    * Create thread on SET_OWNER
    * Fix flushing

    Signed-off-by: Tejun Heo
    Signed-off-by: Michael S. Tsirkin
    Cc: Sridhar Samudrala

    Tejun Heo
     

22 Jul, 2010

1 commit


21 Jul, 2010

2 commits

  • Conflicts:
    drivers/vhost/net.c
    net/bridge/br_device.c

    Fix merge conflict in drivers/vhost/net.c with guidance from
    Stephen Rothwell.

    Revert the effects of net-2.6 commit 573201f36fd9c7c6d5218cdcd9948cee700b277d
    since net-next-2.6 has fixes that make bridge netpoll work properly thus
    we don't need it disabled.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
    bridge: Partially disable netpoll support
    tcp: fix crash in tcp_xmit_retransmit_queue
    IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input())
    ibmveth: lost IRQ while closing/opening device leads to service loss
    rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()
    vhost: avoid pr_err on condition guest can trigger
    ipmr: Don't leak memory if fib lookup fails.
    vhost-net: avoid flush under lock
    net: fix problem in reading sock TX queue
    net/core: neighbour update Oops
    net: skb_tx_hash() fix relative to skb_orphan_try()
    rfs: call sock_rps_record_flow() in tcp_splice_read()
    xfrm: do not assume that template resolving always returns xfrms
    hostap_pci: set dev->base_addr during probe
    axnet_cs: use spin_lock_irqsave in ax_interrupt
    dsa: Fix Kconfig dependencies.
    act_nat: not all of the ICMP packets need an IP header payload
    r8169: incorrect identifier for a 8168dp
    Phonet: fix skb leak in pipe endpoint accept()
    Bluetooth: Update sec_level/auth_type for already existing connections
    ...

    Linus Torvalds
     

16 Jul, 2010

1 commit


15 Jul, 2010

1 commit

  • We flush under vq mutex when changing backends.
    This creates a deadlock as workqueue being flushed
    needs this lock as well.

    https://bugzilla.redhat.com/show_bug.cgi?id=612421

    Drop the vq mutex before flush: we have the device mutex
    which is sufficient to prevent another ioctl from touching
    the vq.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

08 Jul, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (35 commits)
    NET: SB1250: Initialize .owner
    vxge: show startup message with KERN_INFO
    ll_temac: Fix missing iounmaps
    bridge: Clear IPCB before possible entry into IP stack
    bridge br_multicast: BUG: unable to handle kernel NULL pointer dereference
    net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined
    net/ne: fix memory leak in ne_drv_probe()
    xfrm: fix xfrm by MARK logic
    virtio_net: fix oom handling on tx
    virtio_net: do not reschedule rx refill forever
    s2io: resolve statistics issues
    linux/net.h: fix kernel-doc warnings
    net: decreasing real_num_tx_queues needs to flush qdisc
    sched: qdisc_reset_all_tx is calling qdisc_reset without qdisc_lock
    qlge: fix a eeh handler to not add a pending timer
    qlge: Replacing add_timer() to mod_timer()
    usbnet: Set parent device early for netdev_printk()
    net: Revert "rndis_host: Poll status channel before control channel"
    netfilter: ip6t_REJECT: fix a dst leak in ipv6 REJECT
    drivers: bluetooth: bluecard_cs.c: Fixed include error, changed to linux/io.h
    ...

    Linus Torvalds
     
  • David S. Miller
     

02 Jul, 2010

1 commit


27 Jun, 2010

1 commit

  • When ring parsing fails, we currently handle this
    as ring empty condition. This means that we enable
    kicks and recheck ring empty: if this not empty,
    we re-start polling which of course will fail again.

    Instead, let's return a negative error code and stop polling.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     

09 Jun, 2010

1 commit

  • 10, 233 is allocated officially to /dev/kmview which is shipping in
    Ubuntu and Debian distributions. vhost_net seem to have borrowed it
    without making a proper request and this causes regressions in the other
    distributions.

    vhost_net can use a dynamic minor so use that instead. Also update the
    file with a comment to try and avoid future misunderstandings.

    cc: stable@kernel.org
    Signed-off-by: Alan Cox
    [ We should have caught this before 2.6.34 got released. - Linus ]
    Signed-off-by: Linus Torvalds

    Alan Cox
     

02 Jun, 2010

1 commit


27 May, 2010

3 commits


15 Apr, 2010

1 commit