29 Oct, 2013

2 commits


10 Jul, 2013

2 commits

  • Pull networking updates from David Miller:
    "This is a re-do of the net-next pull request for the current merge
    window. The only difference from the one I made the other day is that
    this has Eliezer's interface renames and the timeout handling changes
    made based upon your feedback, as well as a few bug fixes that have
    trickeled in.

    Highlights:

    1) Low latency device polling, eliminating the cost of interrupt
    handling and context switches. Allows direct polling of a network
    device from socket operations, such as recvmsg() and poll().

    Currently ixgbe, mlx4, and bnx2x support this feature.

    Full high level description, performance numbers, and design in
    commit 0a4db187a999 ("Merge branch 'll_poll'")

    From Eliezer Tamir.

    2) With the routing cache removed, ip_check_mc_rcu() gets exercised
    more than ever before in the case where we have lots of multicast
    addresses. Use a hash table instead of a simple linked list, from
    Eric Dumazet.

    3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
    Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
    Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

    4) Support reporting the TUN device persist flag to userspace, from
    Pavel Emelyanov.

    5) Allow controlling network device VF link state using netlink, from
    Rony Efraim.

    6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

    7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
    Daniel Borkmann and Eric Dumazet.

    8) Allow controlling of TCP quickack behavior on a per-route basis,
    from Cong Wang.

    9) Several bug fixes and improvements to vxlan from Stephen
    Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
    support receiving on multiple UDP ports.

    10) Major cleanups, particular in the area of debugging and cookie
    lifetime handline, to the SCTP protocol code. From Daniel
    Borkmann.

    11) Allow packets to cross network namespaces when traversing tunnel
    devices. From Nicolas Dichtel.

    12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
    manner akin to how we monitor real network traffic via ptype_all.
    From Daniel Borkmann.

    13) Several bug fixes and improvements for the new alx device driver,
    from Johannes Berg.

    14) Fix scalability issues in the netem packet scheduler's time queue,
    by using an rbtree. From Eric Dumazet.

    15) Several bug fixes in TCP loss recovery handling, from Yuchung
    Cheng.

    16) Add support for GSO segmentation of MPLS packets, from Simon
    Horman.

    17) Make network notifiers have a real data type for the opaque
    pointer that's passed into them. Use this to properly handle
    network device flag changes in arp_netdev_event(). From Jiri
    Pirko and Timo Teräs.

    18) Convert several drivers over to module_pci_driver(), from Peter
    Huewe.

    19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
    O(1) calculation instead. From Eric Dumazet.

    20) Support setting of explicit tunnel peer addresses in ipv6, just
    like ipv4. From Nicolas Dichtel.

    21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

    22) Prevent a single high rate flow from overruning an individual cpu
    during RX packet processing via selective flow shedding. From
    Willem de Bruijn.

    23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
    Dumazet.

    24) Don't just drop GSO packets which are above the TBF scheduler's
    burst limit, chop them up so they are in-bounds instead. Also
    from Eric Dumazet.

    25) VLAN offloads are missed when configured on top of a bridge, fix
    from Vlad Yasevich.

    26) Support IPV6 in ping sockets. From Lorenzo Colitti.

    27) Receive flow steering targets should be updated at poll() time
    too, from David Majnemer.

    28) Fix several corner case regressions in PMTU/redirect handling due
    to the routing cache removal, from Timo Teräs.

    29) We have to be mindful of ipv4 mapped ipv6 sockets in
    upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

    30) Fix L2TP sequence number handling bugs, from James Chapman."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
    drivers/net: caif: fix wrong rtnl_is_locked() usage
    drivers/net: enic: release rtnl_lock on error-path
    vhost-net: fix use-after-free in vhost_net_flush
    net: mv643xx_eth: do not use port number as platform device id
    net: sctp: confirm route during forward progress
    virtio_net: fix race in RX VQ processing
    virtio: support unlocked queue poll
    net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
    Documentation: Fix references to defunct linux-net@vger.kernel.org
    net/fs: change busy poll time accounting
    net: rename low latency sockets functions to busy poll
    bridge: fix some kernel warning in multicast timer
    sfc: Fix memory leak when discarding scattered packets
    sit: fix tunnel update via netlink
    dt:net:stmmac: Add dt specific phy reset callback support.
    dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
    dt:net:stmmac: Allocate platform data only if its NULL.
    net:stmmac: fix memleak in the open method
    ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
    net: ipv6: fix wrong ping_v6_sendmsg return value
    ...

    Linus Torvalds
     
  • This adds a way to check ring empty state after enable_cb outside any
    locks. Will be used by virtio_net.

    Note: there's room for more optimization: caller is likely to have a
    memory barrier already, which means we might be able to get rid of a
    barrier here. Deferring this optimization until we do some
    benchmarking.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

20 May, 2013

1 commit


03 May, 2013

1 commit

  • Pull virtio & lguest updates from Rusty Russell:
    "Lots of virtio work which wasn't quite ready for last merge window.

    Plus I dived into lguest again, reworking the pagetable code so we can
    move the switcher page: our fixmaps sometimes take more than 2MB now..."

    Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename.
    Hopefully correctly resolved.

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits)
    caif_virtio: Remove bouncing email addresses
    lguest: improve code readability in lg_cpu_start.
    virtio-net: fill only rx queues which are being used
    lguest: map Switcher below fixmap.
    lguest: cache last cpu we ran on.
    lguest: map Switcher text whenever we allocate a new pagetable.
    lguest: don't share Switcher PTE pages between guests.
    lguest: expost switcher_pages array (as lg_switcher_pages).
    lguest: extract shadow PTE walking / allocating.
    lguest: make check_gpte et. al return bool.
    lguest: assume Switcher text is a single page.
    lguest: rename switcher_page to switcher_pages.
    lguest: remove RESERVE_MEM constant.
    lguest: check vaddr not pgd for Switcher protection.
    lguest: prepare to make SWITCHER_ADDR a variable.
    virtio: console: replace EMFILE with EBUSY for already-open port
    virtio-scsi: reset virtqueue affinity when doing cpu hotplug
    virtio-scsi: introduce multiqueue support
    virtio-scsi: push vq lock/unlock into virtscsi_vq_done
    virtio-scsi: pass struct virtio_scsi to virtqueue completion function
    ...

    Linus Torvalds
     

22 Mar, 2013

1 commit


20 Mar, 2013

3 commits

  • These are specialized versions of virtqueue_add_buf(), which cover
    over 80% of cases and are far clearer.

    In particular, the scatterlists passed to these functions don't have
    to be clean (ie. we ignore end markers).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • virtio_scsi can really use this, to avoid the current hack of copying
    the whole sg array. Some other things get slightly neater, too.

    This causes a slowdown in virtqueue_add_buf(), which is implemented as
    a wrapper. This is addressed in the next patches.

    for i in `seq 50`; do /usr/bin/time -f 'Wall time:%e' ./vringh_test --indirect --eventidx --parallel --fast-vringh; done 2>&1 | stats --trim-outliers:

    Before:
    Using CPUS 0 and 3
    Guest: notified 0, pinged 39009-39063(39062)
    Host: notified 39009-39063(39062), pinged 0
    Wall time:1.700000-1.950000(1.723542)

    After:
    Using CPUS 0 and 3
    Guest: notified 0, pinged 39062-39063(39063)
    Host: notified 39062-39063(39063), pinged 0
    Wall time:1.760000-2.220000(1.789167)

    Signed-off-by: Rusty Russell
    Reviewed-by: Wanlong Gao
    Reviewed-by: Asias He

    Rusty Russell
     
  • Add wrappers for the host vrings to support loose
    coupling between the virtio device and driver.

    A new struct vringh_config_ops with the functions
    find_vrhs() and del_vrhs() is added to the virtio_device
    struct. This enables virtio drivers to manage virtio
    host rings without detailed knowledge of how the
    vrings are created and deleted.

    The function vringh_notify() is added so vringh clients
    can notify the other side that buffers are added to the
    used-ring.

    Cc: Ohad Ben-Cohen
    Signed-off-by: Sjur Brændeland
    Signed-off-by: Rusty Russell (constified vringh_config)

    Sjur Brændeland
     

13 Feb, 2013

1 commit


11 Feb, 2013

1 commit


18 Dec, 2012

3 commits


28 Sep, 2012

1 commit

  • Instead of storing the queue index in transport-specific virtio structs,
    this patch moves them to vring_virtqueue and introduces an helper to get
    the value. This lets drivers simplify their management and tracing of
    virtqueues.

    Signed-off-by: Jason Wang
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Rusty Russell

    Jason Wang
     

20 Jul, 2012

1 commit

  • This patch changes virtio-scsi to use a new virtio_driver->scan() callback
    so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has
    set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring
    operation, instead of from within virtscsi_probe().

    This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and
    virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK
    had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur. This fixes a bug
    with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs.

    Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code.

    Signed-off-by: Nicholas Bellinger
    Acked-by: Paolo Bonzini
    Signed-off-by: James Bottomley

    Nicholas Bellinger
     

31 Mar, 2012

1 commit


12 Jan, 2012

4 commits

  • Handle thaw, restore and freeze notifications from the PM core. Expose
    these to individual virtio drivers that can quiesce and resume vq
    operations. For drivers not implementing the thaw() method, use the
    restore method instead.

    These functions also save device-specific data so that the device can be
    put in pre-suspend state after resume, and disable and enable the PCI
    device in the freeze and resume functions, respectively.

    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Amit Shah
     
  • Based on patch by Christoph for virtio_blk speedup:

    Split virtqueue_kick to be able to do the actual notification
    outside the lock protecting the virtqueue. This patch was
    originally done by Stefan Hajnoczi, but I can't find the
    original one anymore and had to recreated it from memory.
    Pointers to the original or corrections for the commit message
    are welcome.

    Stefan's patch was here:

    https://github.com/stefanha/linux/commit/a6d06644e3a58e57a774e77d7dc34c4a5a2e7496
    http://www.spinics.net/lists/linux-virtualization/msg14616.html

    Third time's the charm!

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Remove wrapper functions. This makes the allocation type explicit in
    all callers; I used GPF_KERNEL where it seemed obvious, left it at
    GFP_ATOMIC otherwise.

    Signed-off-by: Rusty Russell
    Reviewed-by: Christoph Hellwig

    Rusty Russell
     
  • The old documentation is left over from when we used a structure with
    strategy pointers.

    And move the documentation to the C file as per kernel practice.
    Though I disagree...

    Signed-off-by: Rusty Russell
    Reviewed-by: Christoph Hellwig

    Rusty Russell
     

02 Nov, 2011

1 commit


24 Oct, 2011

1 commit


30 May, 2011

1 commit

  • Add an API that tells the other side that callbacks
    should be delayed until a lot of work has been done.
    Implement using the new event_idx feature.

    Note: it might seem advantageous to let the drivers
    ask for a callback after a specific capacity has
    been reached. However, as a single head can
    free many entries in the descriptor table,
    we don't really have a clue about capacity
    until get_buf is called. The API is the simplest
    to implement at the moment, we'll see what kind of
    hints drivers can pass when there's more than one
    user of the feature.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     

19 May, 2010

3 commits


13 Mar, 2010

1 commit

  • This adds a new file for virtio 9P device. The file
    contain details of the mount device name that should
    be used to mount the 9P file system.

    Ex: /sys/devices/virtio-pci/virtio1/mount_tag file now
    contian the tag name to be used to mount the 9P file system.

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     

24 Feb, 2010

1 commit

  • There's currently no way for a virtio driver to ask for unused
    buffers, so it has to keep a list itself to reclaim them at shutdown.
    This is redundant, since virtio_ring stores that information. So
    add a new hook to do this.

    Signed-off-by: Shirley Ma
    Signed-off-by: Amit Shah
    Signed-off-by: Rusty Russell

    Shirley Ma
     

23 Sep, 2009

1 commit


12 Jun, 2009

2 commits


02 May, 2008

1 commit

  • A recent proposed feature addition to the virtio block driver revealed
    some flaws in the API: in particular, we assume that feature
    negotiation is complete once a driver's probe function returns.

    There is nothing in the API to require this, however, and even I
    didn't notice when it was violated.

    So instead, we require the driver to specify what features it supports
    in a table, we can then move the feature negotiation into the virtio
    core. The intersection of device and driver features are presented in
    a new 'features' bitmap in the struct virtio_device.

    Note that this highlights the difference between Linux unsigned-long
    bitmaps where each unsigned long is in native endian, and a
    straight-forward little-endian array of bytes.

    Drivers can still remove feature bits in their probe routine if they
    really have to.

    API changes:
    - dev->config->feature() no longer gets and acks a feature.
    - drivers should advertise their features in the 'feature_table' field
    - use virtio_has_feature() for extra sanity when checking feature bits

    Signed-off-by: Rusty Russell

    Rusty Russell
     

08 Apr, 2008

1 commit

  • The 'disable_cb' callback is designed as an optimization to tell the host
    we don't need callbacks now. As it is not reliable, the debug check is
    overzealous: it can happen on two CPUs at the same time. Document this.

    Even if it were reliable, the virtio_net driver doesn't disable
    callbacks on transmit so the START_USE/END_USE debugging reentrance
    protection can be easily tripped even on UP.

    Thanks to Balaji Rao for the bug report and testing.

    Signed-off-by: Rusty Russell
    CC: Balaji Rao
    Signed-off-by: Linus Torvalds

    Rusty Russell
     

17 Mar, 2008

1 commit

  • There is a race in virtio_net, dealing with disabling/enabling the callback.
    I saw the following oops:

    kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218!
    illegal operation: 0001 [#1] SMP
    Modules linked in: sunrpc dm_mod
    CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99
    Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60)
    Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20)
    R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
    Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001
    000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000
    0000000000000000 000000000f870000 0000000000000000 0000000000001237
    000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8
    Krnl Code: 00000000002b819a: a7110001 tmll %r1,1
    00000000002b819e: a7840004 brc 8,2b81a6
    00000000002b81a2: a7f40001 brc 15,2b81a4
    >00000000002b81a6: a51b0001 oill %r1,1
    00000000002b81aa: 40102000 sth %r1,0(%r2)
    00000000002b81ae: 07fe bcr 15,%r14
    00000000002b81b0: eb7ff0380024 stmg %r7,%r15,56(%r15)
    00000000002b81b6: a7f13e00 tmll %r15,15872
    Call Trace:
    ([] 0xfa0bcd0)
    [] vring_interrupt+0x5c/0x6c
    [] do_extint+0xb8/0xf0
    [] ext_no_vtime+0x16/0x1a
    [] cpu_idle+0x1c2/0x1e0

    The problem can be triggered with a high amount of host->guest traffic.
    I think its the following race:

    poll says netif_rx_complete
    poll calls enable_cb
    enable_cb opens the interrupt mask
    a new packet comes, an interrupt is triggered----\
    enable_cb sees that there is more work |
    enable_cb disables the interrupt |
    . V
    . interrupt is delivered
    . skb_recv_done does atomic napi test, ok
    some waiting disable_cb is called->check fails->bang!
    .
    poll would do napi check
    poll would do disable_cb

    The fix is to let enable_cb not disable the interrupt again, but expect the
    caller to do the cleanup if it returns false. In that case, the interrupt is
    only disabled, if the napi test_set_bit was successful.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell (cleaned up doco)

    Christian Borntraeger
     

04 Feb, 2008

3 commits

  • A reset function solves three problems:

    1) It allows us to renegotiate features, eg. if we want to upgrade a
    guest driver without rebooting the guest.

    2) It gives us a clean way of shutting down virtqueues: after a reset,
    we know that the buffers won't be used by the host, and

    3) It helps the guest recover from messed-up drivers.

    So we remove the ->shutdown hook, and the only way we now remove
    feature bits is via reset.

    We leave it to the driver to do the reset before it deletes queues:
    the balloon driver, for example, needs to chat to the host in its
    remove function.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Various drivers want to know when their configuration information
    changes: the balloon driver is the immediate user, but the network
    driver may one day have a "carrier" status as well.

    This introduces that callback (lguest doesn't use it yet).

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • It seems that virtio_net wants to disable callbacks (interrupts) before
    calling netif_rx_schedule(), so we can't use the return value to do so.

    Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
    now returns void, rather than a boolean.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

23 Oct, 2007

1 commit

  • This attempts to implement a "virtual I/O" layer which should allow
    common drivers to be efficiently used across most virtual I/O
    mechanisms. It will no-doubt need further enhancement.

    The virtio drivers add buffers to virtio queues; as the buffers are consumed
    the driver "interrupt" callbacks are invoked.

    There is also a generic implementation of config space which drivers can query
    to get setup information from the host.

    Signed-off-by: Rusty Russell
    Cc: Dor Laor
    Cc: Arnd Bergmann

    Rusty Russell