23 Apr, 2014

1 commit

  • Execute "ethtool -L eth0 combined 0" in guest, if multiqueue
    is enabled, virtnet_send_command() will return -EINVAL error,
    there is a validation in QEMU.

    But if multiqueue is disabled, virtnet_set_queues() will just
    return zero (success). We should return error for this situation.

    Signed-off-by: Amos Kong
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Amos Kong
     

03 Apr, 2014

2 commits

  • Pull networking updates from David Miller:
    "Here is my initial pull request for the networking subsystem during
    this merge window:

    1) Support for ESN in AH (RFC 4302) from Fan Du.

    2) Add full kernel doc for ethtool command structures, from Ben
    Hutchings.

    3) Add BCM7xxx PHY driver, from Florian Fainelli.

    4) Export computed TCP rate information in netlink socket dumps, from
    Eric Dumazet.

    5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
    Dichtel.

    6) Convert many drivers to pci_enable_msix_range(), from Alexander
    Gordeev.

    7) Record SKB timestamps more efficiently, from Eric Dumazet.

    8) Switch to microsecond resolution for TCP round trip times, also
    from Eric Dumazet.

    9) Clean up and fix 6lowpan fragmentation handling by making use of
    the existing inet_frag api for it's implementation.

    10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

    11) Auto size SKB lengths when composing netlink messages based upon
    past message sizes used, from Eric Dumazet.

    12) qdisc dumps can take a long time, add a cond_resched(), From Eric
    Dumazet.

    13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
    Get rid of never-used-in-tree netpoll RX handling. From Eric W
    Biederman.

    14) Support inter-address-family and namespace changing in VTI tunnel
    driver(s). From Steffen Klassert.

    15) Add Altera TSE driver, from Vince Bridgers.

    16) Optimizing csum_replace2() so that it doesn't adjust the checksum
    by checksumming the entire header, from Eric Dumazet.

    17) Expand BPF internal implementation for faster interpreting, more
    direct translations into JIT'd code, and much cleaner uses of BPF
    filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
    Starovoitov"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
    netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
    net: Add a test to see if a skb is freeable in irq context
    qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
    net: ptp: move PTP classifier in its own file
    net: sxgbe: make "core_ops" static
    net: sxgbe: fix logical vs bitwise operation
    net: sxgbe: sxgbe_mdio_register() frees the bus
    Call efx_set_channels() before efx->type->dimension_resources()
    xen-netback: disable rogue vif in kthread context
    net/mlx4: Set proper build dependancy with vxlan
    be2net: fix build dependency on VxLAN
    mac802154: make csma/cca parameters per-wpan
    mac802154: allow only one WPAN to be up at any given time
    net: filter: minor: fix kdoc in __sk_run_filter
    netlink: don't compare the nul-termination in nla_strcmp
    can: c_can: Avoid led toggling for every packet.
    can: c_can: Simplify TX interrupt cleanup
    can: c_can: Store dlc private
    can: c_can: Reduce register access
    can: c_can: Make the code readable
    ...

    Linus Torvalds
     
  • Pull virtio updates from Rusty Russell:
    "Nothing exciting: virtio-blk users might see a bit of a boost from the
    doubling of the default queue length though"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio-blk: base queue-depth on virtqueue ringsize or module param
    Revert a02bbb1ccfe8: MAINTAINERS: add virtio-dev ML for virtio
    virtio: fail adding buffer on broken queues.
    virtio-rng: don't crash if virtqueue is broken.
    virtio_balloon: don't crash if virtqueue is broken.
    virtio_blk: don't crash, report error if virtqueue is broken.
    virtio_net: don't crash if virtqueue is broken.
    virtio_balloon: don't softlockup on huge balloon changes.
    virtio: Use pci_enable_msix_exact() instead of pci_enable_msix()
    MAINTAINERS: virtio-dev is subscribers only
    tools/virtio: add a missing )
    tools/virtio: fix missing kmemleak_ignore symbol
    tools/virtio: update internal copies of headers

    Linus Torvalds
     

30 Mar, 2014

1 commit


28 Mar, 2014

1 commit

  • Current error handling of virtqueue_kick() was wrong in two places:
    - The skb were freed immediately when virtqueue_kick() fail during
    xmit. This may lead double free since the skb was not detached from
    the virtqueue.
    - try_fill_recv() returns false when virtqueue_kick() fail. This will
    lead unnecessary rescheduling of refill work.

    Actually, it's safe to just ignore the kick failure in those two
    places. So this patch fixes this by partially revert commit
    67975901183799af8e93ec60e322f9e2a1940b9b.

    Fixes 67975901183799af8e93ec60e322f9e2a1940b9b
    (virtio_net: verify if virtqueue_kick() succeeded).

    Cc: Heinz Graalfs
    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Signed-off-by: Jason Wang
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Jason Wang
     

25 Mar, 2014

1 commit


15 Mar, 2014

1 commit

  • Replace the bh safe variant with the hard irq safe variant.

    We need a hard irq safe variant to deal with netpoll transmitting
    packets from hard irq context, and we need it in most if not all of
    the places using the bh safe variant.

    Except on 32bit uni-processor the code is exactly the same so don't
    bother with a bh variant, just have a hard irq safe variant that
    everyone can use.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

13 Mar, 2014

1 commit


25 Feb, 2014

1 commit

  • We should alloc big buffers also when guest can receive UFO
    packets to let the big packets fit into guest rx buffer.

    Fixes 5c5167515d80f78f6bb538492c423adcae31ad65
    (virtio-net: Allow UFO feature to be set and advertised.)

    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Cc: Sridhar Samudrala
    Signed-off-by: Jason Wang
    Acked-by: Michael S. Tsirkin
    Acked-by: Rusty Russell
    Signed-off-by: David S. Miller

    Jason Wang
     

17 Jan, 2014

4 commits

  • Add initial support for per-rx queue sysfs attributes to virtio-net. If
    mergeable packet buffers are enabled, adds a read-only mergeable packet
    buffer size sysfs attribute for each RX queue.

    Suggested-by: Michael S. Tsirkin
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Michael Dalton
    Signed-off-by: David S. Miller

    Michael Dalton
     
  • Commit 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag
    allocators") changed the mergeable receive buffer size from PAGE_SIZE to
    MTU-size, introducing a single-stream regression for benchmarks with large
    average packet size. There is no single optimal buffer size for all
    workloads. For workloads with packet size =
    PAGE_SIZE will use PAGE_SIZE buffers.

    These optimizations interact positively with recent commit
    ba275241030c ("virtio-net: coalesce rx frags when possible during rx"),
    which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
    optimizations benefit buffers of any size.

    Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
    between two QEMU VMs on a single physical machine. Each VM has two VCPUs
    with all offloads & vhost enabled. All VMs and vhost threads run in a
    single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
    in the system will not be scheduled on the benchmark CPUs. Trunk includes
    SKB rx frag coalescing.

    net-next w/ virtio_net before 2613af0ed18a (PAGE_SIZE bufs): 14642.85Gb/s
    net-next (MTU-size bufs): 13170.01Gb/s
    net-next + auto-tune: 14555.94Gb/s

    Jason Wang also reported a throughput increase on mlx4 from 22Gb/s
    using MTU-sized buffers to about 26Gb/s using auto-tuning.

    Signed-off-by: Michael Dalton
    Signed-off-by: David S. Miller

    Michael Dalton
     
  • The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
    mergeable rx buffer allocations. This commit migrates virtio-net to use
    per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
    mergeable rx buffer memory allocation, which now will use skb_refill_frag()
    for both atomic and GFP-WAIT buffer allocations.

    To address fragmentation concerns, if after buffer allocation there
    is too little space left in the page frag to allocate a subsequent
    buffer, the remaining space is added to the current allocated buffer
    so that the remaining space can be used to store packet data.

    Acked-by: Michael S. Tsirkin
    Signed-off-by: Michael Dalton
    Signed-off-by: David S. Miller

    Michael Dalton
     
  • It looks like there's no need for those two fields:

    - Unless there's a failure for the first refill try, rq->max should be always
    equal to the vring size.
    - rq->num is only used to determine the condition that we need to do the refill,
    we could check vq->num_free instead.
    - rq->num was required to be increased or decreased explicitly after each
    get/put which results a bad API.

    So this patch removes them both to make the code simpler.

    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Signed-off-by: Jason Wang
    Acked-by: Rusty Russell
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Jason Wang
     

07 Jan, 2014

1 commit


03 Jan, 2014

1 commit

  • During restoring, try_fill_recv() was called with neither napi lock nor napi
    disabled. This can lead two try_fill_recv() was called in the same time. Fix
    this by refilling before trying to enable napi.

    Fixes 0741bcb5584f9e2390ae6261573c4de8314999f2
    (virtio: net: Add freeze, restore handlers to support S4).

    Cc: Amit Shah
    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Cc: Eric Dumazet
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     

11 Dec, 2013

2 commits


10 Dec, 2013

1 commit


07 Dec, 2013

4 commits

  • When a packet with invalid length arrives, ensure that the packet
    is freed correctly if mergeable packet buffers and big packets
    (GUEST_TSO4) are both enabled.

    Signed-off-by: Michael Dalton
    Acked-by: Jason Wang
    Acked-by: Andrew Vagin
    Signed-off-by: David S. Miller

    Michael Dalton
     
  • free_netdev calls netif_napi_del too, but it's too late, because napi
    structures are placed on vi->rq. netif_napi_add() is called from
    virtnet_alloc_queues.

    general protection fault: 0000 [#1] SMP
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables virtio_balloon pcspkr virtio_net(-) i2c_pii
    CPU: 1 PID: 347 Comm: rmmod Not tainted 3.13.0-rc2+ #171
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800b779c420 ti: ffff8800379e0000 task.ti: ffff8800379e0000
    RIP: 0010:[] [] __list_del_entry+0x29/0xd0
    RSP: 0018:ffff8800379e1dd0 EFLAGS: 00010a83
    RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800379c2fd0 RCX: dead000000200200
    RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: ffff8800379c2fd0
    RBP: ffff8800379e1dd0 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800379c2f90
    R13: ffff880037839160 R14: 0000000000000000 R15: 00000000013352f0
    FS: 00007f1400e34740(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f464124c763 CR3: 00000000b68cf000 CR4: 00000000000006e0
    Stack:
    ffff8800379e1df0 ffffffff8155beab 6b6b6b6b6b6b6b2b ffff8800378391c0
    ffff8800379e1e18 ffffffff8156499b ffff880037839be0 ffff880037839d20
    ffff88003779d3f0 ffff8800379e1e38 ffffffffa003477c ffff88003779d388
    Call Trace:
    [] netif_napi_del+0x1b/0x80
    [] free_netdev+0x8b/0x110
    [] virtnet_remove+0x7c/0x90 [virtio_net]
    [] virtio_dev_remove+0x23/0x80
    [] __device_release_driver+0x7f/0xf0
    [] driver_detach+0xc0/0xd0
    [] bus_remove_driver+0x58/0xd0
    [] driver_unregister+0x2c/0x50
    [] unregister_virtio_driver+0xe/0x10
    [] virtio_net_driver_exit+0x10/0x6ce [virtio_net]
    [] SyS_delete_module+0x172/0x220
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? __audit_syscall_entry+0x9c/0xf0
    [] system_call_fastpath+0x16/0x1b
    Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00
    RIP [] __list_del_entry+0x29/0xd0
    RSP
    ---[ end trace d5931cd3f87c9763 ]---

    Fixes: 986a4f4d452d (virtio_net: multiqueue support)
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Signed-off-by: Andrey Vagin
    Acked-by: Michael S. Tsirkin
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Andrey Vagin
     
  • free_unused_bufs must check vi->mergeable_rx_bufs before
    vi->big_packets, because we use this sequence in other places.
    Otherwise we allocate buffer of one type, then free it as another
    type.

    general protection fault: 0000 [#1] SMP
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables pcspkr virtio_balloon virtio_net(-) i2c_pii
    CPU: 0 PID: 400 Comm: rmmod Not tainted 3.13.0-rc2+ #170
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800b6d2a210 ti: ffff8800aed32000 task.ti: ffff8800aed32000
    RIP: 0010:[] [] free_unused_bufs+0xc3/0x190 [virtio_net]
    RSP: 0018:ffff8800aed33dd8 EFLAGS: 00010202
    RAX: ffff8800b1fe2c00 RBX: ffff8800b66a7240 RCX: 6b6b6b6b6b6b6b6b
    RDX: 6b6b6b6b6b6b6b6b RSI: ffff8800b8419a68 RDI: ffff8800b66a1148
    RBP: ffff8800aed33e00 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
    R13: ffff8800b66a1148 R14: 0000000000000000 R15: 000077ff80000000
    FS: 00007fc4f9c4e740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f63f432f000 CR3: 00000000b6538000 CR4: 00000000000006f0
    Stack:
    ffff8800b66a7240 ffff8800b66a7380 ffff8800377bd3f0 0000000000000000
    00000000023302f0 ffff8800aed33e18 ffffffffa00346e2 ffff8800b66a7240
    ffff8800aed33e38 ffffffffa003474d ffff8800377bd388 ffff8800377bd390
    Call Trace:
    [] remove_vq_common+0x22/0x40 [virtio_net]
    [] virtnet_remove+0x4d/0x90 [virtio_net]
    [] virtio_dev_remove+0x23/0x80
    [] __device_release_driver+0x7f/0xf0
    [] driver_detach+0xc0/0xd0
    [] bus_remove_driver+0x58/0xd0
    [] driver_unregister+0x2c/0x50
    [] unregister_virtio_driver+0xe/0x10
    [] virtio_net_driver_exit+0x10/0x7be [virtio_net]
    [] SyS_delete_module+0x172/0x220
    [] ? trace_hardirqs_on+0xd/0x10
    [] ? __audit_syscall_entry+0x9c/0xf0
    [] system_call_fastpath+0x16/0x1b
    Code: c0 74 55 0f 1f 44 00 00 80 7b 30 00 74 7a 48 8b 50 30 4c 89 e6 48 03 73 20 48 85 d2 0f 84 bb 00 00 00 66 0f
    RIP [] free_unused_bufs+0xc3/0x190 [virtio_net]
    RSP
    ---[ end trace edb570ea923cce9c ]---

    Fixes: 2613af0ed18a (virtio_net: migrate mergeable rx buffers to page frag allocators)
    Cc: Michael Dalton
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Signed-off-by: Andrey Vagin
    Acked-by: Michael S. Tsirkin
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Andrey Vagin
     
  • Several files refer to an old address for the Free Software Foundation
    in the file header comment. Resolve by replacing the address with
    the URL so that we do not have to keep
    updating the header comments anytime the address changes.

    CC: Jay Vosburgh
    CC: Veaceslav Falico
    CC: Andy Gospodarek
    CC: Haiyang Zhang
    CC: "K. Y. Srinivasan"
    CC: Paul Mackerras
    CC: Ian Campbell
    CC: Wei Liu
    CC: Rusty Russell
    CC: "Michael S. Tsirkin"
    Signed-off-by: Jeff Kirsher
    Acked-by: Wei Liu
    Signed-off-by: David S. Miller

    Jeff Kirsher
     

02 Dec, 2013

2 commits

  • receive mergeable now handles errors internally.
    Do same for big and small packet paths, otherwise
    the logic is too hard to follow.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     
  • Eric Dumazet noticed that if we encounter an error
    when processing a mergeable buffer, we don't
    dequeue all of the buffers from this packet,
    the result is almost sure to be loss of networking.

    Jason Wang noticed that we also leak a page and that we don't decrement
    the rq buf count, so we won't repost buffers (a resource leak).

    Fix both issues.

    Cc: Rusty Russell
    Cc: Michael Dalton
    Reported-by: Eric Dumazet
    Reported-by: Jason Wang
    Signed-off-by: Michael S. Tsirkin
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller

    Michael S. Tsirkin
     

01 Dec, 2013

1 commit


20 Nov, 2013

1 commit

  • Pull networking fixes from David Miller:
    "Mostly these are fixes for fallout due to merge window changes, as
    well as cures for problems that have been with us for a much longer
    period of time"

    1) Johannes Berg noticed two major deficiencies in our genetlink
    registration. Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

    2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports. From Ben Hutchings.

    3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

    4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet(). From Alexei Starovoitov.

    5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki
    Makita.

    6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

    7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease. One particularly problematic area is 802.11 AMPDU in
    wireless. From Eric Dumazet.

    8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here. Fix from Eric Dumzaet, reported by Dave Jones.

    9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

    10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account. From Michael Dalton.

    11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic
    bumping, this one has been with us for a while. From Eric Dumazet.

    12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

    13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

    14) macvlan has the same issue as normal vlans did wrt. propagating LRO
    disabling down to the real device, fix it the same way. From Michal
    Kubecek.

    15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

    16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

    17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

    18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little. Fix
    from Vlad Yasevich.

    19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace. From Hannes Frederic Sowa. There is another fix in the
    works from Hannes which will prevent future problems of this nature.

    20) Fix route leak in VTI tunnel transmit, from Fan Du.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
    genetlink: make multicast groups const, prevent abuse
    genetlink: pass family to functions using groups
    genetlink: add and use genl_set_err()
    genetlink: remove family pointer from genl_multicast_group
    genetlink: remove genl_unregister_mc_group()
    hsr: don't call genl_unregister_mc_group()
    quota/genetlink: use proper genetlink multicast APIs
    drop_monitor/genetlink: use proper genetlink multicast APIs
    genetlink: only pass array to genl_register_family_with_ops()
    tcp: don't update snd_nxt, when a socket is switched from repair mode
    atm: idt77252: fix dev refcnt leak
    xfrm: Release dst if this dst is improper for vti tunnel
    netlink: fix documentation typo in netlink_set_err()
    be2net: Delete secondary unicast MAC addresses during be_close
    be2net: Fix unconditional enabling of Rx interface options
    net, virtio_net: replace the magic value
    ping: prevent NULL pointer dereference on write to msg_name
    bnx2x: Prevent "timeout waiting for state X"
    bnx2x: prevent CFC attention
    bnx2x: Prevent panic during DMAE timeout
    ...

    Linus Torvalds
     

19 Nov, 2013

1 commit


15 Nov, 2013

2 commits

  • Pull virtio updates from Rusty Russell:
    "Nothing really exciting: some groundwork for changing virtio endian,
    and some robustness fixes for broken virtio devices, plus minor
    tweaks"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    virtio_scsi: verify if queue is broken after virtqueue_get_buf()
    x86, asmlinkage, lguest: Pass in globals into assembler statement
    virtio: mmio: fix signature checking for BE guests
    virtio_ring: adapt to notify() returning bool
    virtio_net: verify if queue is broken after virtqueue_get_buf()
    virtio_console: verify if queue is broken after virtqueue_get_buf()
    virtio_blk: verify if queue is broken after virtqueue_get_buf()
    virtio_ring: add new function virtqueue_is_broken()
    virtio_test: verify if virtqueue_kick() succeeded
    virtio_net: verify if virtqueue_kick() succeeded
    virtio_ring: let virtqueue_{kick()/notify()} return a bool
    virtio_ring: change host notification API
    virtio_config: remove virtio_config_val
    virtio: use size-based config accessors.
    virtio_config: introduce size-based accessors.
    virtio_ring: plug kmemleak false positive.
    virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM

    Linus Torvalds
     
  • Commit 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page
    frag allocators") changed the mergeable receive buffer size from PAGE_SIZE
    to MTU-size. However, the merge buffer size does not take into account the
    size of the virtio-net header. Consequently, packets that are MTU-size
    will take two buffers intead of one (to store the virtio-net header),
    substantially decreasing the throughput of MTU-size traffic due to TCP
    window / SKB truesize effects.

    This commit changes the mergeable buffer size to include the virtio-net
    header. The buffer size is cacheline-aligned because skb_page_frag_refill
    will not automatically align the requested size.

    Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
    between two QEMU VMs on a single physical machine. Each VM has two VCPUs and
    vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup
    cpuset, using cgroups to ensure that other processes in the system will not
    be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive
    buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to
    force MTU-sized packets on the receiver.

    next-net trunk before 2613af0ed18a (PAGE_SIZE buf): 3861.08Gb/s
    net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s
    net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s
    net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s

    Suggested-by: Eric Northup
    Signed-off-by: Michael Dalton
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Michael Dalton
     

14 Nov, 2013

1 commit

  • Pull core locking changes from Ingo Molnar:
    "The biggest changes:

    - add lockdep support for seqcount/seqlocks structures, this
    unearthed both bugs and required extra annotation.

    - move the various kernel locking primitives to the new
    kernel/locking/ directory"

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    block: Use u64_stats_init() to initialize seqcounts
    locking/lockdep: Mark __lockdep_count_forward_deps() as static
    lockdep/proc: Fix lock-time avg computation
    locking/doc: Update references to kernel/mutex.c
    ipv6: Fix possible ipv6 seqlock deadlock
    cpuset: Fix potential deadlock w/ set_mems_allowed
    seqcount: Add lockdep functionality to seqcount/seqlock structures
    net: Explicitly initialize u64_stats_sync structures for lockdep
    locking: Move the percpu-rwsem code to kernel/locking/
    locking: Move the lglocks code to kernel/locking/
    locking: Move the rwsem code to kernel/locking/
    locking: Move the rtmutex code to kernel/locking/
    locking: Move the semaphore core to kernel/locking/
    locking: Move the spinlock code to kernel/locking/
    locking: Move the lockdep code to kernel/locking/
    locking: Move the mutex code to kernel/locking/
    hung_task debugging: Add tracepoint to report the hang
    x86/locking/kconfig: Update paravirt spinlock Kconfig description
    lockstat: Report avg wait and hold times
    lockdep, x86/alternatives: Drop ancient lockdep fixup message
    ...

    Linus Torvalds
     

06 Nov, 2013

2 commits

  • In order to enable lockdep on seqcount/seqlock structures, we
    must explicitly initialize any locks.

    The u64_stats_sync structure, uses a seqcount, and thus we need
    to introduce a u64_stats_init() function and use it to initialize
    the structure.

    This unfortunately adds a lot of fairly trivial initialization code
    to a number of drivers. But the benefit of ensuring correctness makes
    this worth while.

    Because these changes are required for lockdep to be enabled, and the
    changes are quite trivial, I've not yet split this patch out into 30-some
    separate patches, as I figured it would be better to get the various
    maintainers thoughts on how to best merge this change along with
    the seqcount lockdep enablement.

    Feedback would be appreciated!

    Signed-off-by: John Stultz
    Acked-by: Julian Anastasov
    Signed-off-by: Peter Zijlstra
    Cc: Alexey Kuznetsov
    Cc: "David S. Miller"
    Cc: Eric Dumazet
    Cc: Hideaki YOSHIFUJI
    Cc: James Morris
    Cc: Jesse Gross
    Cc: Mathieu Desnoyers
    Cc: "Michael S. Tsirkin"
    Cc: Mirko Lindner
    Cc: Patrick McHardy
    Cc: Roger Luethi
    Cc: Rusty Russell
    Cc: Simon Horman
    Cc: Stephen Hemminger
    Cc: Steven Rostedt
    Cc: Thomas Petazzoni
    Cc: Wensong Zhang
    Cc: netdev@vger.kernel.org
    Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     
  • We used to use a percpu structure vq_index to record the cpu to queue
    mapping, this is suboptimal since it duplicates the work of XPS and
    loses all other XPS functionality such as allowing user to configure
    their own transmission steering strategy.

    So this patch switches to use XPS and suggest a default mapping when
    the number of cpus is equal to the number of queues. With XPS support,
    there's no need for keeping per-cpu vq_index and .ndo_select_queue(),
    so they were removed also.

    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Acked-by: Rusty Russell
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     

05 Nov, 2013

2 commits

  • Commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable
    rx buffers to page frag allocators) try to increase the payload/truesize for
    MTU-sized traffic. But this will introduce the extra overhead for GSO packets
    received because of the frag list. This commit tries to reduce this issue by
    coalesce the possible rx frags when possible during rx. Test result shows the
    about 15% improvement on full size GSO packet receiving (and even better than
    before commit 2613af0ed18a11d5c566a81f9a6510b73180660a).

    Before this commit:
    ./netperf -H 192.168.100.4
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
    () port 0 AF_INET : demo
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 10.00 20303.87

    After this commit:
    ./netperf -H 192.168.100.4
    MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
    () port 0 AF_INET : demo
    Recv Send Send
    Socket Socket Message Elapsed
    Size Size Size Time Throughput
    bytes bytes bytes secs. 10^6bits/sec

    87380 16384 16384 10.00 23841.26

    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Cc: Michael Dalton
    Cc: Eric Dumazet
    Acked-by: Michael S. Tsirkin
    Acked-by: Eric Dumazet
    Signed-off-by: Jason Wang
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Jason Wang
     
  • Conflicts:
    drivers/net/ethernet/emulex/benet/be.h
    drivers/net/netconsole.c
    net/bridge/br_private.h

    Three mostly trivial conflicts.

    The net/bridge/br_private.h conflict was a function signature (argument
    addition) change overlapping with the extern removals from Joe Perches.

    In drivers/net/netconsole.c we had one change adjusting a printk message
    whilst another changed "printk(KERN_INFO" into "pr_info(".

    Lastly, the emulex change was a new inline function addition overlapping
    with Joe Perches's extern removals.

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Oct, 2013

1 commit

  • commit 3ab098df35f8b98b6553edc2e40234af512ba877 (virtio-net: don't respond to
    cpu hotplug notifier if we're not ready) tries to bypass the cpu hotplug
    notifier by checking the config_enable and does nothing is it was false. So it
    need to try to hold the config_lock mutex which may happen in atomic
    environment which leads the following warnings:

    [ 622.944441] CPU0 attaching NULL sched-domain.
    [ 622.944446] CPU1 attaching NULL sched-domain.
    [ 622.944485] CPU0 attaching NULL sched-domain.
    [ 622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616
    [ 622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1
    [ 622.950796] no locks held by migration/1/10.
    [ 622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317
    [ 622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    [ 622.950802] 0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70
    [ 622.950803] ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed
    [ 622.950805] 0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000
    [ 622.950805] Call Trace:
    [ 622.950810] [] dump_stack+0x54/0x74
    [ 622.950815] [] __might_sleep+0x112/0x114
    [ 622.950817] [] mutex_lock_nested+0x3c/0x3c6
    [ 622.950818] [] ? up+0x39/0x3e
    [ 622.950821] [] ? acpi_os_signal_semaphore+0x21/0x2d
    [ 622.950824] [] ? acpi_ut_release_mutex+0x5e/0x62
    [ 622.950828] [] virtnet_cpu_callback+0x33/0x87
    [ 622.950830] [] notifier_call_chain+0x3c/0x5e
    [ 622.950832] [] __raw_notifier_call_chain+0xe/0x10
    [ 622.950835] [] __cpu_notify+0x20/0x37
    [ 622.950836] [] cpu_notify+0x13/0x15
    [ 622.950838] [] take_cpu_down+0x27/0x3a
    [ 622.950841] [] stop_machine_cpu_stop+0x93/0xf1
    [ 622.950842] [] cpu_stopper_thread+0xa0/0x12f
    [ 622.950844] [] ? cpu_stopper_thread+0x12f/0x12f
    [ 622.950847] [] ? lock_release_holdtime.part.7+0xa3/0xa8
    [ 622.950848] [] ? cpu_stop_should_run+0x3f/0x47
    [ 622.950850] [] smpboot_thread_fn+0x1c5/0x1e3
    [ 622.950852] [] ? lg_global_unlock+0x67/0x67
    [ 622.950854] [] kthread+0xd8/0xe0
    [ 622.950857] [] ? wait_for_common+0x12f/0x164
    [ 622.950859] [] ? kthread_create_on_node+0x124/0x124
    [ 622.950861] [] ret_from_fork+0x7c/0xb0
    [ 622.950862] [] ? kthread_create_on_node+0x124/0x124
    [ 622.950876] smpboot: CPU 1 is now offline
    [ 623.194556] SMP alternatives: lockdep: fixing up alternatives
    [ 623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1
    ...

    A correct fix is to unregister the hotcpu notifier during restore and register a
    new one in resume.

    Reported-by: Fengguang Wu
    Tested-by: Fengguang Wu
    Cc: Wanlong Gao
    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Signed-off-by: Jason Wang
    Acked-by: Michael S. Tsirkin
    Reviewed-by: Wanlong Gao
    Signed-off-by: David S. Miller

    Jason Wang
     

29 Oct, 2013

3 commits


18 Oct, 2013

2 commits

  • We used to schedule the refill work unconditionally after changing the
    number of queues. This may lead an issue if the device is not
    up. Since we only try to cancel the work in ndo_stop(), this may cause
    the refill work still work after removing the device. Fix this by only
    schedule the work when device is up.

    The bug were introduce by commit 9b9cd8024a2882e896c65222aa421d461354e3f2.
    (virtio-net: fix the race between channels setting and refill)

    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang
     
  • We're trying to re-configure the affinity unconditionally in cpu hotplug
    callback. This may lead the issue during resuming from s3/s4 since

    - virt queues haven't been allocated at that time.
    - it's unnecessary since thaw method will re-configure the affinity.

    Fix this issue by checking the config_enable and do nothing is we're not ready.

    The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
    (virtio-net: reset virtqueue affinity when doing cpu hotplug).

    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Cc: Wanlong Gao
    Acked-by: Michael S. Tsirkin
    Reviewed-by: Wanlong Gao
    Signed-off-by: Jason Wang
    Signed-off-by: David S. Miller

    Jason Wang