11 Jul, 2013

1 commit

  • Pull virtio updates from Rusty Russell:
    "No real surprises"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    MAINTAINERS: add tools/virtio/ under virtio
    tools/virtio: move module license stub to module.h
    virtio: include asm/barrier explicitly
    virtio: VIRTIO_F_ANY_LAYOUT feature
    lguest: fix example launcher compilation for broken glibc headers.
    virtio-net: fix the race between channels setting and refill
    tools/lguest: real barriers.
    tools/lguest: fix missing rmb().
    virtio_balloon: leak_balloon(): only tell host if we got pages deflated
    virtio-pci: fix leaks of msix_affinity_masks
    Fix comment typo "CONFIG_PAE"

    Linus Torvalds
     

10 Jul, 2013

1 commit

  • virtio net called virtqueue_enable_cq on RX path after napi_complete, so
    with NAPI_STATE_SCHED clear - outside the implicit napi lock.
    This violates the requirement to synchronize virtqueue_enable_cq wrt
    virtqueue_add_buf. In particular, used event can move backwards,
    causing us to lose interrupts.
    In a debug build, this can trigger panic within START_USE.

    Jason Wang reports that he can trigger the races artificially,
    by adding udelay() in virtqueue_enable_cb() after virtio_mb().

    However, we must call napi_complete to clear NAPI_STATE_SCHED before
    polling the virtqueue for used buffers, otherwise napi_schedule_prep in
    a callback will fail, causing us to lose RX events.

    To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED
    set (under napi lock), later call virtqueue_poll with
    NAPI_STATE_SCHED clear (outside the lock).

    Reported-by: Jason Wang
    Tested-by: Jason Wang
    Acked-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

04 Jul, 2013

1 commit

  • Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx queues
    which are being used) tries to refill on demand when changing the number of
    channels by call try_refill_recv() directly, this may race:

    - the refill work who may do the refill in the same time
    - the try_refill_recv() called in bh since napi was not disabled

    Which may led guest complain during setting channels:

    virtio_net virtio0: input.1:id 0 is not a head!

    Solve this issue by scheduling a refill work which can guarantee the
    serialization of refill.

    Cc: Sasha Levin
    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Signed-off-by: Jason Wang
    Signed-off-by: Rusty Russell

    Jason Wang
     

23 May, 2013

1 commit

  • Commit 55257d72bd1c51f25106350f4983ec19f62ed1fa (virtio-net: fill only rx
    queues which are being used) only does the napi enabling during open for
    curr_queue_pairs. This will break multiqueue receiving since napi of new queues
    were still disabled after changing the number of queues.

    This patch fixes this by enabling napi for all possible queues during open.

    Cc: Sasha Levin
    Signed-off-by: Jason Wang
    Acked-by: Rusty Russell
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Jason Wang
     

12 May, 2013

1 commit

  • Since commit 82dc3c63c692b1e1d5937 ("net: introduce NAPI_POLL_WEIGHT")
    we warn drivers when they use napi weight higher than NAPI_POLL_WEIGHT,
    but virtio_net still uses 128 by default. This patch makes its default
    value to NAPI_POLL_WEIGHT.

    Cc: "Michael S. Tsirkin"
    Cc: Eric Dumazet
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     

03 May, 2013

1 commit

  • Pull virtio & lguest updates from Rusty Russell:
    "Lots of virtio work which wasn't quite ready for last merge window.

    Plus I dived into lguest again, reworking the pagetable code so we can
    move the switcher page: our fixmaps sometimes take more than 2MB now..."

    Ugh. Annoying conflicts with the tcm_vhost -> vhost_scsi rename.
    Hopefully correctly resolved.

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (57 commits)
    caif_virtio: Remove bouncing email addresses
    lguest: improve code readability in lg_cpu_start.
    virtio-net: fill only rx queues which are being used
    lguest: map Switcher below fixmap.
    lguest: cache last cpu we ran on.
    lguest: map Switcher text whenever we allocate a new pagetable.
    lguest: don't share Switcher PTE pages between guests.
    lguest: expost switcher_pages array (as lg_switcher_pages).
    lguest: extract shadow PTE walking / allocating.
    lguest: make check_gpte et. al return bool.
    lguest: assume Switcher text is a single page.
    lguest: rename switcher_page to switcher_pages.
    lguest: remove RESERVE_MEM constant.
    lguest: check vaddr not pgd for Switcher protection.
    lguest: prepare to make SWITCHER_ADDR a variable.
    virtio: console: replace EMFILE with EBUSY for already-open port
    virtio-scsi: reset virtqueue affinity when doing cpu hotplug
    virtio-scsi: introduce multiqueue support
    virtio-scsi: push vq lock/unlock into virtscsi_vq_done
    virtio-scsi: pass struct virtio_scsi to virtqueue completion function
    ...

    Linus Torvalds
     

29 Apr, 2013

1 commit

  • Due to MQ support we may allocate a whole bunch of rx queues but
    never use them. With this patch we'll safe the space used by
    the receive buffers until they are actually in use:

    sh-4.2# free -h
    total used free shared buffers cached
    Mem: 490M 35M 455M 0B 0B 4.1M
    -/+ buffers/cache: 31M 459M
    Swap: 0B 0B 0B
    sh-4.2# ethtool -L eth0 combined 8
    sh-4.2# free -h
    total used free shared buffers cached
    Mem: 490M 162M 327M 0B 0B 4.1M
    -/+ buffers/cache: 158M 331M
    Swap: 0B 0B 0B

    Signed-off-by: Sasha Levin
    Signed-off-by: Rusty Russell

    Sasha Levin
     

20 Apr, 2013

2 commits


12 Apr, 2013

1 commit

  • There's nothing that prevent passing the device features of virtio_net to its
    vlan device. So this patch simply passes those to vlan device to benefit from
    advanced features.

    Netperf shows better sending performance for vlan device since TSO can work on
    vlan now.

    before:
    netperf -H 192.168.5.2
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
    port 0 AF_INET : demo
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 10.00 4162.35

    after:
    netperf -H 192.168.5.2
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
    port 0 AF_INET : demo
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 10.00 9365.42

    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Signed-off-by: Jason Wang
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Jason Wang
     

22 Mar, 2013

1 commit


20 Mar, 2013

2 commits


27 Feb, 2013

1 commit

  • Pull virtio updates from Rusty Russell:
    "All trivial, thanks to the stuff which didn't quite make it time"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio_console: Initialize guest_connected=true for rproc_serial
    virtio: use module_virtio_driver.
    virtio: Add module driver macro for virtio drivers.
    virtio_console: Use virtio device index to generate port name
    virtio: make pci_device_id const
    virtio: make config_ops const
    virtio-mmio: fix wrong comment about register offset
    virtio_console: Let unconnected rproc device receive data.

    Linus Torvalds
     

14 Feb, 2013

1 commit

  • Patch cef401de7be8c4e (net: fix possible wrong checksum
    generation) fixed wrong checksum calculation but it broke TSO by
    defining new GSO type but not a netdev feature for that type.
    net_gso_ok() would not allow hardware checksum/segmentation
    offload of such packets without the feature.

    Following patch fixes TSO and wrong checksum. This patch uses
    same logic that Eric Dumazet used. Patch introduces new flag
    SKBTX_SHARED_FRAG if at least one frag can be modified by
    the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
    info tx_flags rather than gso_type.

    tx_flags is better compared to gso_type since we can have skb with
    shared frag without gso packet. It does not link SHARED_FRAG to
    GSO, So there is no need to define netdev feature for this.

    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

13 Feb, 2013

1 commit


05 Feb, 2013

1 commit


30 Jan, 2013

1 commit


28 Jan, 2013

1 commit

  • Pravin Shelar mentioned that GSO could potentially generate
    wrong TX checksum if skb has fragments that are overwritten
    by the user between the checksum computation and transmit.

    He suggested to linearize skbs but this extra copy can be
    avoided for normal tcp skbs cooked by tcp_sendmsg().

    This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
    in skb_shinfo(skb)->gso_type if at least one frag can be
    modified by the user.

    Typical sources of such possible overwrites are {vm}splice(),
    sendfile(), and macvtap/tun/virtio_net drivers.

    Tested:

    $ netperf -H 7.7.8.84
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    7.7.8.84 () port 0 AF_INET
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 10.00 3959.52

    $ netperf -H 7.7.8.84 -t TCP_SENDFILE
    TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
    port 0 AF_INET
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 10.00 3216.80

    Performance of the SENDFILE is impacted by the extra allocation and
    copy, and because we use order-0 pages, while the TCP_STREAM uses
    bigger pages.

    Reported-by: Pravin Shelar
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Jan, 2013

3 commits

  • Add a cpu notifier to virtio-net, so that we can reset the
    virtqueue affinity if the cpu hotplug happens. It improve
    the performance through enabling or disabling the virtqueue
    affinity after doing cpu hotplug.

    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Cc: virtualization@lists.linux-foundation.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Wanlong Gao
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Wanlong Gao
     
  • Split out the clean affinity function to virtnet_clean_affinity().

    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Cc: virtualization@lists.linux-foundation.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Wanlong Gao
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Wanlong Gao
     
  • As Michael mentioned, set affinity and select queue will not work very
    well when CPU IDs are not consecutive, this can happen with hot unplug.
    Fix this bug by traversal the online CPUs, and create a per cpu variable
    to find the mapping from CPU to the preferable virtual-queue.

    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Jason Wang
    Cc: Eric Dumazet
    Cc: "David S. Miller"
    Cc: virtualization@lists.linux-foundation.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Wanlong Gao
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Wanlong Gao
     

22 Jan, 2013

2 commits


21 Dec, 2012

1 commit

  • Pull virtio update from Rusty Russell:
    "Some nice cleanups, and even a patch my wife did as a "live" demo for
    Latinoware 2012.

    There's a slightly non-trivial merge in virtio-net, as we cleaned up
    the virtio add_buf interface while DaveM accepted the mq virtio-net
    patches."

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
    virtio_console: Add support for remoteproc serial
    virtio_console: Merge struct buffer_token into struct port_buffer
    virtio: add drv_to_virtio to make code clearly
    virtio: use dev_to_virtio wrapper in virtio
    virtio-mmio: Fix irq parsing in command line parameter
    virtio_console: Free buffers from out-queue upon close
    virtio: Convert dev_printk(KERN_ to dev_(
    virtio_console: Use kmalloc instead of kzalloc
    virtio_console: Free buffer if splice fails
    virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
    virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
    virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
    virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
    virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
    virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
    virtio: console: don't rely on virtqueue_add_buf() returning capacity.
    virtio_net: don't rely on virtqueue_add_buf() returning capacity.
    virtio-net: remove unused skb_vnet_hdr->num_sg field
    virtio-net: correct capacity math on ring full
    virtio: move queue_index and num_free fields into core struct virtqueue.
    ...

    Linus Torvalds
     

18 Dec, 2012

4 commits


11 Dec, 2012

1 commit


09 Dec, 2012

3 commits

  • This patch implements the ethtool_{set|get}_channels method of virtio-net to
    allow user to change the number of queues when the device is running on demand.

    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     
  • This patch adds the multiqueue (VIRTIO_NET_F_MQ) support to virtio_net
    driver. VIRTIO_NET_F_MQ capable device could allow the driver to do packet
    transmission and reception through multiple queue pairs and does the packet
    steering to get better performance. By default, one one queue pair is used, user
    could change the number of queue pairs by ethtool in the next patch.

    When multiple queue pairs is used and the number of queue pairs is equal to the
    number of vcpus. Driver does the following optimizations to implement per-cpu
    virt queue pairs:

    - select the txq based on the smp processor id.
    - smp affinity hint to the cpu that owns the queue pairs.

    This could be used with the flow steering support of the device to guarantee the
    packets of a single flow is handled by the same cpu.

    Signed-off-by: Krishna Kumar
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     
  • To support multiqueue transmitq/receiveq, the first step is to separate queue
    related structure from virtnet_info. This patch introduce send_queue and
    receive_queue structure and use the pointer to them as the parameter in
    functions handling sending/receiving.

    Signed-off-by: Krishna Kumar
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     

04 Dec, 2012

1 commit

  • CONFIG_HOTPLUG is going away as an option. As result the __dev*
    markings will be going away.

    Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
    and __devexit.

    Signed-off-by: Bill Pemberton
    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Greg Kroah-Hartman

    Bill Pemberton
     

10 Nov, 2012

1 commit


03 Oct, 2012

1 commit

  • Pull networking changes from David Miller:

    1) GRE now works over ipv6, from Dmitry Kozlov.

    2) Make SCTP more network namespace aware, from Eric Biederman.

    3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

    4) Make openvswitch network namespace aware, from Pravin B Shelar.

    5) IPV6 NAT implementation, from Patrick McHardy.

    6) Server side support for TCP Fast Open, from Jerry Chu and others.

    7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

    8) Increate the loopback default MTU to 64K, from Eric Dumazet.

    9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic. This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

    10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail. Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

    11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

    12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

    13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

    Fix up various fairly trivial conflicts, mostly caused by the user
    namespace changes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
    hyperv: Add buffer for extended info after the RNDIS response message.
    hyperv: Report actual status in receive completion packet
    hyperv: Remove extra allocated space for recv_pkt_list elements
    hyperv: Fix page buffer handling in rndis_filter_send_request()
    hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
    hyperv: Fix the max_xfer_size in RNDIS initialization
    vxlan: put UDP socket in correct namespace
    vxlan: Depend on CONFIG_INET
    sfc: Fix the reported priorities of different filter types
    sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
    sfc: Fix loopback self-test with separate_tx_channels=1
    sfc: Fix MCDI structure field lookup
    sfc: Add parentheses around use of bitfield macro arguments
    sfc: Fix null function pointer in efx_sriov_channel_type
    vxlan: virtual extensible lan
    igmp: export symbol ip_mc_leave_group
    netlink: add attributes to fdb interface
    tg3: unconditionally select HWMON support when tg3 is enabled.
    Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
    gre: fix sparse warning
    ...

    Linus Torvalds
     

21 Aug, 2012

1 commit

  • system_nrt[_freezable]_wq are now spurious. Mark them deprecated and
    convert all users to system[_freezable]_wq.

    If you're cc'd and wondering what's going on: Now all workqueues are
    non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
    Please use system[_freezable]_wq instead.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-By: Lai Jiangshan

    Cc: Jens Axboe
    Cc: David Airlie
    Cc: Jiri Kosina
    Cc: "David S. Miller"
    Cc: Rusty Russell
    Cc: "Paul E. McKenney"
    Cc: David Howells

    Tejun Heo
     

15 Aug, 2012

1 commit

  • I believe net/core/dev.c is a better place for netif_notify_peers(),
    because other net event notify functions also stay in this file.

    And rename it to netdev_notify_peers().

    Cc: David S. Miller
    Cc: Ian Campbell
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Amerigo Wang
     

23 Jul, 2012

1 commit

  • Fix race condition in several network drivers when reading stats on 32bit
    UP architectures. These drivers update their stats in a BH context and
    therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
    instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
    stats.

    Signed-off-by: Kevin Groeneveld
    Signed-off-by: David S. Miller

    Kevin Groeneveld
     

30 Jun, 2012

1 commit