19 Jul, 2013

2 commits

  • Pull networking fixes from David Miller:
    "A couple interesting SKB fragment handling fixes, plus the usual small
    bits here and there:

    1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
    Tim Gardner.

    2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
    address printing helper.

    3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
    device can't do checksumming offloads then it shouldn't say it can
    do SG either. From Haiyang Zhang.

    4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.

    5) Don't leak DMA mappings on mapping failures, from Neil Horman.

    6) We need to reset the transport header of SKBs in ipv4 before we
    attempt to perform early socket demux, just like ipv6 does. From
    Eric Dumazet.

    7) Add missing locking on vxlan device removal, from Stephen
    Hemminger.

    8) xen-netfront has to make two passes over an SKB to prepare it for
    transfer. One pass calculates the number of slots needed, the
    second massages the SKB and fills the slots. Unfortunately, the
    first pass doesn't calculate the number of slots properly so we
    can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
    work out so well. Fix from Jan Beulich with help and discussion
    with several others.

    9) Fix a similar problem in tun and macvtap, which have to split up
    scatter-gather elements at PAGE_SIZE boundaries. Don't do
    zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from
    Jason Wang.

    10) On receive, once we've decoded the VLAN state completely, clear
    skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger
    the VLAN code again, corrupting the packet. Fix from Eric
    Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
    vlan: fix a race in egress prio management
    vlan: mask vlan prio bits
    macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
    tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
    pkt_sched: sch_qfq: remove a source of high packet delay/jitter
    xen-netfront: pull on receive skb may need to happen earlier
    vxlan: add necessary locking on device removal
    hyperv: Fix the NETIF_F_SG flag setting in netvsc
    net: Fix sysfs_format_mac() code duplication.
    be2net: Fix to avoid hardware workaround when not needed
    macvtap: do not assume 802.1Q when send vlan packets
    macvtap: fix the missing ret value of TUNSETQUEUE
    ipv4: set transport header earlier
    mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
    bgmac: add dependency to phylib
    net/irda: fixed style issues in irlan_eth
    ethtool: fixed trailing statements in ethtool
    ndisc: bool initializations should use true and false
    atl1e: unmap partially mapped skb on dma error and free skb

    Linus Torvalds
     
  • In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
    ("vlan: don't deliver frames for unknown vlans to protocols")
    Florian made sure we set pkt_type to PACKET_OTHERHOST
    if the vlan id is set and we could find a vlan device for this
    particular id.

    But we also have a problem if prio bits are set.

    Steinar reported an issue on a router receiving IPv6 frames with a
    vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
    because skb->vlan_tci is set.

    Forwarded frame is completely corrupted : We can see (8100:4000)
    being inserted in the middle of IPv6 source address :

    16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
    9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
    0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
    0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
    0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
    0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

    It seems we are not really ready to properly cope with this right now.

    We can probably do better in future kernels :
    vlan_get_ingress_priority() should be a netdev property instead of
    a per vlan_dev one.

    For stable kernels, lets clear vlan_tci to fix the bugs.

    Reported-by: Steinar H. Gunderson
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

17 Jul, 2013

1 commit


15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the net/* uses of the __cpuinit macros
    from all C files.

    [1] https://lkml.org/lkml/2013/5/20/589

    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

13 Jul, 2013

1 commit


12 Jul, 2013

1 commit

  • This change makes it so that the GRE and VXLAN tunnels can make use of Tx
    checksum offload support provided by some drivers via the hw_enc_features.
    Without this fix enabling GSO means sacrificing Tx checksum offload and
    this actually leads to a performance regression as shown below:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    6276.51 8.39 enabled
    7123.52 8.42 disabled

    To resolve this it was necessary to address two items. First
    netif_skb_features needed to be updated so that it would correctly handle
    the Trans Ether Bridging protocol without impacting the need to check for
    Q-in-Q tagging. To do this it was necessary to update harmonize_features
    so that it used skb_network_protocol instead of just using the outer
    protocol.

    Second it was necessary to update the GRE and UDP tunnel segmentation
    offloads so that they would reset the encapsulation bit and inner header
    offsets after the offload was complete.

    As a result of this change I have seen the following results on a interface
    with Tx checksum enabled for encapsulated frames:

    Utilization
    Send
    Throughput local GSO
    10^6bits/s % S state
    7123.52 8.42 disabled
    8321.75 5.43 enabled

    v2: Instead of replacing refrence to skb->protocol with
    skb_network_protocol just replace the protocol reference in
    harmonize_features to allow for double VLAN tag checks.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

11 Jul, 2013

2 commits


10 Jul, 2013

1 commit

  • Pull networking updates from David Miller:
    "This is a re-do of the net-next pull request for the current merge
    window. The only difference from the one I made the other day is that
    this has Eliezer's interface renames and the timeout handling changes
    made based upon your feedback, as well as a few bug fixes that have
    trickeled in.

    Highlights:

    1) Low latency device polling, eliminating the cost of interrupt
    handling and context switches. Allows direct polling of a network
    device from socket operations, such as recvmsg() and poll().

    Currently ixgbe, mlx4, and bnx2x support this feature.

    Full high level description, performance numbers, and design in
    commit 0a4db187a999 ("Merge branch 'll_poll'")

    From Eliezer Tamir.

    2) With the routing cache removed, ip_check_mc_rcu() gets exercised
    more than ever before in the case where we have lots of multicast
    addresses. Use a hash table instead of a simple linked list, from
    Eric Dumazet.

    3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
    Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
    Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

    4) Support reporting the TUN device persist flag to userspace, from
    Pavel Emelyanov.

    5) Allow controlling network device VF link state using netlink, from
    Rony Efraim.

    6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

    7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
    Daniel Borkmann and Eric Dumazet.

    8) Allow controlling of TCP quickack behavior on a per-route basis,
    from Cong Wang.

    9) Several bug fixes and improvements to vxlan from Stephen
    Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
    support receiving on multiple UDP ports.

    10) Major cleanups, particular in the area of debugging and cookie
    lifetime handline, to the SCTP protocol code. From Daniel
    Borkmann.

    11) Allow packets to cross network namespaces when traversing tunnel
    devices. From Nicolas Dichtel.

    12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
    manner akin to how we monitor real network traffic via ptype_all.
    From Daniel Borkmann.

    13) Several bug fixes and improvements for the new alx device driver,
    from Johannes Berg.

    14) Fix scalability issues in the netem packet scheduler's time queue,
    by using an rbtree. From Eric Dumazet.

    15) Several bug fixes in TCP loss recovery handling, from Yuchung
    Cheng.

    16) Add support for GSO segmentation of MPLS packets, from Simon
    Horman.

    17) Make network notifiers have a real data type for the opaque
    pointer that's passed into them. Use this to properly handle
    network device flag changes in arp_netdev_event(). From Jiri
    Pirko and Timo Teräs.

    18) Convert several drivers over to module_pci_driver(), from Peter
    Huewe.

    19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
    O(1) calculation instead. From Eric Dumazet.

    20) Support setting of explicit tunnel peer addresses in ipv6, just
    like ipv4. From Nicolas Dichtel.

    21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

    22) Prevent a single high rate flow from overruning an individual cpu
    during RX packet processing via selective flow shedding. From
    Willem de Bruijn.

    23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
    Dumazet.

    24) Don't just drop GSO packets which are above the TBF scheduler's
    burst limit, chop them up so they are in-bounds instead. Also
    from Eric Dumazet.

    25) VLAN offloads are missed when configured on top of a bridge, fix
    from Vlad Yasevich.

    26) Support IPV6 in ping sockets. From Lorenzo Colitti.

    27) Receive flow steering targets should be updated at poll() time
    too, from David Majnemer.

    28) Fix several corner case regressions in PMTU/redirect handling due
    to the routing cache removal, from Timo Teräs.

    29) We have to be mindful of ipv4 mapped ipv6 sockets in
    upd_v6_push_pending_frames(). From Hannes Frederic Sowa.

    30) Fix L2TP sequence number handling bugs, from James Chapman."

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
    drivers/net: caif: fix wrong rtnl_is_locked() usage
    drivers/net: enic: release rtnl_lock on error-path
    vhost-net: fix use-after-free in vhost_net_flush
    net: mv643xx_eth: do not use port number as platform device id
    net: sctp: confirm route during forward progress
    virtio_net: fix race in RX VQ processing
    virtio: support unlocked queue poll
    net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
    Documentation: Fix references to defunct linux-net@vger.kernel.org
    net/fs: change busy poll time accounting
    net: rename low latency sockets functions to busy poll
    bridge: fix some kernel warning in multicast timer
    sfc: Fix memory leak when discarding scattered packets
    sit: fix tunnel update via netlink
    dt:net:stmmac: Add dt specific phy reset callback support.
    dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
    dt:net:stmmac: Allocate platform data only if its NULL.
    net:stmmac: fix memleak in the open method
    ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
    net: ipv6: fix wrong ping_v6_sendmsg return value
    ...

    Linus Torvalds
     

09 Jul, 2013

1 commit

  • Rename functions in include/net/ll_poll.h to busy wait.
    Clarify documentation about expected power use increase.
    Rename POLL_LL to POLL_BUSY_LOOP.
    Add need_resched() testing to poll/select busy loops.

    Note, that in select and poll can_busy_poll is dynamic and is
    updated continuously to reflect the existence of supported
    sockets with valid queue information.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

04 Jul, 2013

2 commits

  • inner_protocol was added to struct sk_buff in
    0d89d2035fe063461a5ddb609b2c12e7fb006e44 ("MPLS: Add limited GSO support"),
    which is scheduled to be included in v3.11.

    That patch did not update __copy_skb_header to copy the inner_protocol.

    Signed-off-by: Joe Stringer
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Conflicts:
    drivers/net/ethernet/freescale/fec_main.c
    drivers/net/ethernet/renesas/sh_eth.c
    net/ipv4/gre.c

    The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
    and the splitting of the gre.c code into seperate files.

    The FEC conflict was two sets of changes adding ethtool support code
    in an "!CONFIG_M5272" CPP protected block.

    Finally the sh_eth.c conflict was between one commit add bits set
    in the .eesr_err_check mask whilst another commit removed the
    .tx_error_check member and assignments.

    Signed-off-by: David S. Miller

    David S. Miller
     

03 Jul, 2013

2 commits

  • The dev_forward_skb() assignment of pkt_type should be done
    after the call to eth_type_trans().

    ip-encapsulated packets can be handled by localhost. But skb->pkt_type
    can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device.
    In that case, the packet is dropped by ip_rcv().
    Although this example uses gretap. l2tp-eth also has same issue.
    For l2tp-eth case, add dummy device for ip address and ip l2tp command.

    netns A | root netns | netns B
    vethveth=bridge=gretap gretap=bridge=vethveth

    arp packet ->
    pkt_type
    BROADCAST------------>ip_rcv()------------------------>

    local 172.17.107.3 dev lo src 172.17.107.3
    > cache
    ip route get 172.17.107.4
    > local 172.17.107.4 dev lo src 172.17.107.4
    > cache
    ip link add vetha type veth peer name vetha-peer
    ip link add vethb type veth peer name vethb-peer
    brctl addbr bra
    brctl addbr brb
    brctl addif bra tapa
    brctl addif bra vetha-peer
    brctl addif brb tapb
    brctl addif brb vethb-peer
    brctl show
    > bridge name bridge id STP enabled interfaces
    > bra 8000.6ea21e758ff1 no tapa
    > vetha-peer
    > brb 8000.420020eb92d5 no tapb
    > vethb-peer
    ip link set vetha-peer up
    ip link set vethb-peer up
    ip link set bra up
    ip link set brb up
    ip netns add a
    ip netns add b
    ip link set vetha netns a
    ip link set vethb netns b
    ip netns exec a ip address add 10.0.0.3/24 dev vetha
    ip netns exec b ip address add 10.0.0.4/24 dev vethb
    ip netns exec a ip link set vetha up
    ip netns exec b ip link set vethb up
    ip netns exec a arping -I vetha 10.0.0.4
    ARPING 10.0.0.4 from 10.0.0.3 vetha
    ^CSent 2 probes (2 broadcast(s))
    Received 0 response(s)

    Cc: Jason Wang
    Cc: "Michael S. Tsirkin"
    Cc: Eric Dumazet
    Cc: Patrick McHardy
    Cc: Hong Zhiguo
    Cc: Rami Rosen
    Cc: Tom Parkin
    Cc: Cong Wang
    Cc: Pravin B Shelar
    Cc: Jesse Gross
    Cc: dev@openvswitch.org
    Signed-off-by: Isaku Yamahata
    Signed-off-by: David S. Miller

    Isaku Yamahata
     
  • Pull char/misc updates from Greg KH:
    "Here's the big char/misc driver tree merge for 3.11-rc1

    A variety of different driver patches here. All of these have been in
    linux-next for a while, and the networking patches were acked-by David
    Miller, as it made sense for those patches to come through this tree"

    * tag 'char-misc-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (102 commits)
    Revert "char: misc: assign file->private_data in all cases"
    drivers: uio_pdrv_genirq: Use of_match_ptr() macro
    mei: check whether hw start has succeeded
    mei: check if the hardware reset succeeded
    mei: mei_cl_connect: don't multiply the timeout twice
    mei: do not override a client writing state when buffering
    mei: move mei_cl_irq_write_complete to client.c
    UIO: Fix concurrency issue
    drivers: uio_dmem_genirq: Use of_match_ptr() macro
    char: misc: assign file->private_data in all cases
    drivers: hv: allocate synic structures before hv_synic_init()
    drivers: hv: check interrupt mask before read_index
    vme: vme_tsi148.c: fix error return code in tsi148_probe()
    FMC: fix error handling in probe() function
    fmc: avoid readl/writel namespace conflict
    FMC: NULL dereference on allocation failure
    UIO: fix uio_pdrv_genirq with device tree but no interrupt
    UIO: allow binding uio_pdrv_genirq.c to devices using command line option
    FMC: add a char-device mezzanine driver
    FMC: add a driver to write mezzanine EEPROM
    ...

    Linus Torvalds
     

02 Jul, 2013

3 commits

  • As the patch "bnx2x: remove zeroing of dump data buffer" showed,
    it is too easy implement .get_dump_data incorrectly in a driver.

    Let's make sure drivers cannot get confused by userspace requesting
    a too big dump.

    Also WARN if the driver sets dump->len to something weird and make
    sure the length reported to userspace is the actual length of data
    copied to userspace.

    Signed-off-by: Michal Schmidt
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Michal Schmidt
     
  • Stephen Hemminger says:

    ====================
    Here is current updates for vxlan in net-next. It includes Mike's changes
    to handle multiple destinations and lots of little cosmetic stuff.

    This is a fresh vxlan-next repository which was forked from net-next.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • There is a race in neighbour code, because neigh_destroy() uses
    skb_queue_purge(&neigh->arp_queue) without holding neighbour lock,
    while other parts of the code assume neighbour rwlock is what
    protects arp_queue

    Convert all skb_queue_purge() calls to the __skb_queue_purge() variant

    Use __skb_queue_head_init() instead of skb_queue_head_init()
    to make clear we do not use arp_queue.lock

    And hold neigh->lock in neigh_destroy() to close the race.

    Reported-by: Joe Jin
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Jun, 2013

1 commit


27 Jun, 2013

1 commit

  • When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
    rename of a network interface, it can end up waiting for a workqueue
    to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
    SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
    the fact that read_secklock_begin() will spin forever waiting for the
    writer process (the one doing the interface rename) to update the
    devnet_rename_seq sequence.

    This patch fixes the problem by adding a helper (netdev_get_name())
    and using it in the code handling the SIOCGIFNAME ioctl and
    SO_BINDTODEVICE setsockopt.

    The netdev_get_name() helper uses raw_seqcount_begin() to avoid
    spinning forever, waiting for devnet_rename_seq->sequence to become
    even. cond_resched() is used in the contended case, before retrying
    the access to give the writer process a chance to finish.

    The use of raw_seqcount_begin() will incur some unneeded work in the
    reader process in the contended case, but this is better than
    deadlocking the system.

    Signed-off-by: Nicolas Schichan
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Nicolas Schichan
     

26 Jun, 2013

4 commits

  • Stephen Hemminger
     
  • select/poll busy-poll support.

    Split sysctl value into two separate ones, one for read and one for poll.
    updated Documentation/sysctl/net.txt

    Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
    sk_poll_ll if possible. sock_poll sets this flag in its return value
    to indicate to select/poll when a socket that can busy poll is found.

    When poll/select have nothing to report, call the low-level
    sock_poll again until we are out of time or we find something.

    Once the system call finds something, it stops setting POLL_LL, so it can
    return the result to the user ASAP.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     
  • commit 68c331631143 ("v4 GRE: Add TCP segmentation offload for GRE")
    added a possible skb leak, because it frees only the head of segment
    list, in case a skb_linearize() call fails.

    This patch adds a kfree_skb_list() helper to fix the bug.

    Signed-off-by: Eric Dumazet
    Cc: Pravin B Shelar
    Cc: Daniel Borkmann
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This is required for multiple default destinations management in VXLAN

    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Hemminger

    Mike Rapoport
     

24 Jun, 2013

2 commits

  • Callers of skb_seq_read() are currently forced to call skb_abort_seq_read()
    even when consuming all the data because the last call to skb_seq_read (the
    one that returns 0 to indicate the end) fails to unmap the last fragment page.

    With this patch callers will be allowed to traverse the SKB data by calling
    skb_prepare_seq_read() once and repeatedly calling skb_seq_read() as originally
    intended (and documented in the original commit 677e90eda), that is, only call
    skb_abort_seq_read() if the sequential read is actually aborted.

    Signed-off-by: Wedson Almeida Filho
    Signed-off-by: David S. Miller

    Wedson Almeida Filho
     
  • netif_alloc_netdev_queues() uses kcalloc() to allocate memory
    for the "struct netdev_queue *_tx" array.

    For large number of tx queues, kcalloc() might fail, so this
    patch does a fallback to vzalloc().

    As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
    to kzalloc() flags to do this fallback only when really needed.

    Signed-off-by: Eric Dumazet
    Acked-by: Michael S. Tsirkin
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Jun, 2013

4 commits

  • thresh and interval are global resources,
    only init net can change them.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/
    directory to the un-init_net, but we can still use cmd such as
    "ip ntable change name arp_cache locktime 129" to change the locktime
    of default neigh_parms.

    This patch disallows the un-init_net to find out the neigh_table.parms.
    So the un-init_net will failed to influence the init_net.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • neigh_table.parms always exist and is initialized,kmemdup
    can use it to create new neigh_parms, actually lookup_neigh_parms
    here will return neigh_table.parms too.

    Signed-off-by: Gao feng
    Signed-off-by: David S. Miller

    Gao feng
     
  • Conflicts:
    drivers/net/wireless/ath/ath9k/Kconfig
    drivers/net/xen-netback/netback.c
    net/batman-adv/bat_iv_ogm.c
    net/wireless/nl80211.c

    The ath9k Kconfig conflict was a change of a Kconfig option name right
    next to the deletion of another option.

    The xen-netback conflict was overlapping changes involving the
    handling of the notify list in xen_netbk_rx_action().

    Batman conflict resolution provided by Antonio Quartulli, basically
    keep everything in both conflict hunks.

    The nl80211 conflict is a little more involved. In 'net' we added a
    dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
    Linus reported. Meanwhile in 'net-next' the handlers were converted
    to use pre and post doit handlers which use a flag to determine
    whether to hold the RTNL mutex around the operation.

    However, the dump handlers to not use this logic. Instead they have
    to explicitly do the locking. There were apparent bugs in the
    conversion of nl80211_dump_wiphy() in that we were not dropping the
    RTNL mutex in all the return paths, and it seems we very much should
    be doing so. So I fixed that whilst handling the overlapping changes.

    To simplify the initial returns, I take the RTNL mutex after we try
    to allocate 'tb'.

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Jun, 2013

4 commits

  • As part of the push to add 802.1ad server provider tagging support to the
    kernel the VLAN features flags were renamed. Unfortunately the kernel name
    for the VLAN hardware acceleration features that the kernel shows user space
    was included in the rename, which broke ethtool (txvlan and rxvlan options
    do not work). This patch restores the original names, i.e. the original ABI.
    If we wanted to make clear to users that we are refering to CTAGs we can
    always change ethtool's short_name and long_name for these features (for
    example something along the lines of txvlan -> txvlan-ctag, tx-vlan-offload ->
    tx-vlan-ctag-offload).

    Cc: Patrick McHardy
    Cc: David S. Miller
    Cc: netdev@vger.kernel.org
    Signed-off-by: Fernando Luis Vazquez Cao
    Reviewed-by: Ben Hutchings
    Signed-off-by: David S. Miller

    Fernando Luis Vazquez Cao
     
  • adds a socket option for low latency polling.
    This allows overriding the global sysctl value with a per-socket one.
    Unexport sysctl_net_ll_poll since for now it's not needed in modules.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     
  • There is no reason for sysctl_net_ll_poll to be an unsigned long.
    Change it into an unsigned int.
    Fix the proc handler.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     
  • We want the fixes in here.

    Greg Kroah-Hartman
     

14 Jun, 2013

2 commits

  • Add netlink directives and ndo entry to allow for controling
    VF link, which can be in one of three states:

    Auto - VF link state reflects the PF link state (default)

    Up - VF link state is up, traffic from VF to VF works even if
    the actual PF link is down

    Down - VF link state is down, no traffic from/to this VF, can be of
    use while configuring the VF

    Signed-off-by: Rony Efraim
    Signed-off-by: Or Gerlitz
    Signed-off-by: David S. Miller

    Rony Efraim
     
  • Caught by sparse:
    - __rcu: missing annotation to sd->flow_limit
    - __user: direct access in cpumask_scnprintf

    Also
    - add endline character when printing bitmap if room in buffer
    - avoid bucket overflow by reducing FLOW_LIMIT_HISTORY

    The last item warrants some explanation. The hashtable buckets are
    subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
    to bucket size, since all packets may end up in a single bucket. The
    current (rather arbitrary) history value of 256 happens to match the
    buffer size (u8).

    As a result, with a single flow, the first 128 packets are accepted
    (correct), the second 128 packets dropped (correct) and then the
    history[] array has filled, so that each subsequent new packet
    causes an increment in the bucket for new_flow plus a decrement
    for old_flow: a steady state.

    This is fine if packets are dropped, as the steady state goes away
    as soon as a mix of traffic reappears. But, because the 256th packet
    overflowed the bucket to 0: no packets are dropped.

    Instead of explicitly adding an overflow check, this patch changes
    FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.

    Reported-by: Fengguang Wu
    (first item)

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

13 Jun, 2013

2 commits

  • Reduce the uses of this unnecessary typedef.

    Done via perl script:

    $ git grep --name-only -w ctl_table net | \
    xargs perl -p -i -e '\
    sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
    s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

    Reflow the modified lines that now exceed 80 columns.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • Since team functionality relies heavily on userspace daemon, we need to
    deliver event to userspace via Netlink as quick as possible. So make all
    team port device link events urgent.

    Signed-off-by: Flavio Leitner
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Flavio Leitner
     

12 Jun, 2013

1 commit

  • We currently allow for numa-node aware skb allocation only within the
    fill_packet_ipv4() path, but not in fill_packet_ipv6(). Consolidate that
    code to a common allocation helper to enable numa-node aware skb
    allocation for ipv6, and use it in both paths. This also makes both
    functions a bit more readable.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

11 Jun, 2013

2 commits

  • struct gnet_stats_rate_est contains u32 fields, so the bytes per second
    field can wrap at 34360Mbit.

    Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
    and switch the kernel to use this structure natively.

    This structure is dumped to user space as a new attribute :

    TCA_STATS_RATE_EST64

    Old tc command will now display the capped bps (to 34360Mbit), instead
    of wrapped values, and updated tc command will display correct
    information.

    Old tc command output, after patch :

    eric:~# tc -s -d qd sh dev lo
    qdisc pfifo 8001: root refcnt 2 limit 1000p
    Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
    rate 34360Mbit 189696pps backlog 0b 0p requeues 0

    This patch carefully reorganizes "struct Qdisc" layout to get optimal
    performance on SMP.

    Signed-off-by: Eric Dumazet
    Cc: Ben Hutchings
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Since commit 1a37e412a022(net: Use 16bits for *_headers fields of struct
    skbuff), skb->*_header are relative to skb->head,
    so copy_skb_header() should not call skb_headers_offset_update() now,
    and we should pass correct parameter to skb_headers_offset_update() in
    pskb_expand_head() and skb_copy_expand().

    Signed-off-by: Weiping Pan
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Peter Pan(潘卫平)