01 Nov, 2013

1 commit

  • This patch adds documentation about the broadcast manager. It's based on Brian
    Thorne's initial patch http://marc.info/?l=linux-can&m=138119382015496&w=2 and
    Daniele Venzano's work http://brownhat.org/docs/socketcan.html .

    Signed-off-by: Brian Thorne
    Cc: Daniele Venzano
    Cc: Andre Naujoks
    Signed-off-by: Oliver Hartkopp
    Signed-off-by: Marc Kleine-Budde

    Oliver Hartkopp
     

31 Oct, 2013

4 commits

  • After reading the function rt6_check_neigh(), we can
    know that the RT6_NUD_FAIL_SOFT can be returned only
    when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false.
    so in function find_match(), there is no need to execute
    the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF).

    Signed-off-by: Duan Jiong
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Duan Jiong
     
  • This change is inspired by checkpatch.

    Signed-off-by: Weilong Chen
    Signed-off-by: David S. Miller

    Chen Weilong
     
  • Copying whole packets with skb_copy_from_linear_data_offset is a pretty
    bad idea. CPU was spending time in __copy_user_common and network
    performance was lower. With the new solution iperf-measured speed
    increased from 116Mb/s to 134Mb/s.

    Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     
  • The message dispatching part of tipc_recv_msg() is wrapped layers of
    while/if/if/switch, causing out-of-control indentation and does not
    look very good. We reduce two indentation levels by separating the
    message dispatching from the blocks that checks link state and
    sequence numbers, allowing longer function and arg names to be
    consistently indented without wrapping. Additionally we also rename
    "cont" label to "discard" and add one new label called "unlock_discard"
    to make code clearer. In all, these are cosmetic changes that do not
    alter the operation of TIPC in any way.

    Signed-off-by: Ying Xue
    Reviewed-by: Erik Hugne
    Cc: David Laight
    Cc: Andreas Bofjäll
    Signed-off-by: David S. Miller

    Ying Xue
     

30 Oct, 2013

14 commits

  • Checking if MAC address is valid using is_valid_ether_addr() is already done in
    of_get_mac_address(). While at it, reorganize checking so it matches checks in
    other drivers.

    Signed-off-by: Luka Perkov
    CC: Alexey Brodkin
    CC: David Miller
    Signed-off-by: David S. Miller

    Luka Perkov
     
  • Checking if MAC address is valid using is_valid_ether_addr() is already done in
    of_get_mac_address().

    Signed-off-by: Luka Perkov
    Acked-by: Thomas Petazzoni
    CC: David Miller
    Signed-off-by: David S. Miller

    Luka Perkov
     
  • Checking if MAC address is valid using is_valid_ether_addr() is already done in
    of_get_mac_address().

    Signed-off-by: Luka Perkov
    Acked-by: David Daney
    CC: David Miller
    Signed-off-by: David S. Miller

    Luka Perkov
     
  • Fast Open currently has a fall back feature to address SYN-data being
    dropped but it requires the middle-box to pass on regular SYN retry
    after SYN-data. This is implemented in commit aab487435 ("net-tcp:
    Fast Open client - detecting SYN-data drops")

    However some NAT boxes will drop all subsequent packets after first
    SYN-data and blackholes the entire connections. An example is in
    commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
    Open".

    The sender should note such incidents and fall back to use the regular
    TCP handshake on subsequent attempts temporarily as well: after the
    second SYN timeouts the original Fast Open SYN is most likely lost.
    When such an event recurs Fast Open is disabled based on the number of
    recurrences exponentially.

    Signed-off-by: Yuchung Cheng
    Signed-off-by: Neal Cardwell
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • Jeff Kirsher says:

    ====================
    This series contains updates to vxlan, net, ixgbe, ixgbevf, and i40e.

    Joseph provides a single patch against vxlan which removes the burden
    from the NIC drivers to check if the vxlan driver is enabled in the
    kernel and also makes available the vxlan headrooms to the drivers.

    Jacob provides majority of the patches, with patches against net, ixgbe
    and ixgbevf. His net patch adds might_sleep() call to napi_disable so
    that every use of napi_disable during atomic context will be visible.
    Then Jacob provides a patch to fix qv_lock_napi call in
    ixgbe_napi_disable_all. The other ixgbe patches cleanup
    ixgbe_check_minimum_link function to correctly show that there are some
    minor loss of encoding, even though we don't calculate it and remove
    unnecessary duplication of PCIe bandwidth display. Lastly, Jacob
    provides 4 patches against ixgbevf to add ixgbevf_rx_skb in line with
    how ixgbe handles the variations on how packets can be received, adds
    support in order to track how many packets were cleaned during busy poll
    as part of the extended statistics.

    Wei Yongjun provides a fix for i40e to return -ENOMEN in the memory
    allocation error handling case instead of returning 0, as done
    elsewhere in this function.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Amend the documentation in the mvmdio driver to note the fact
    that it is now used by both the mvneta and mv643xx_eth drivers.

    Signed-off-by: Leigh Brown
    Signed-off-by: David S. Miller

    Leigh Brown
     
  • Make only a single call to mutex_unlock in orion_mdio_write.

    Signed-off-by: Leigh Brown
    Signed-off-by: David S. Miller

    Leigh Brown
     
  • Replace manual poll of MVMDIO_SMI_READ_VALID with a call to
    orion_mdio_wait_ready. This ensures a consistent timeout,
    eliminates a busy loop, and allows for use of interrupts on
    systems that support them.

    Signed-off-by: Leigh Brown
    Signed-off-by: David S. Miller

    Leigh Brown
     
  • Amend orion_mdio_wait_ready so that the same timeout is used when
    polling or using wait_event_timeout. Set the timeout to 1ms.

    Replace udelay with usleep_range to avoid a busy loop, and set the
    polling interval range as 45us to 55us, so that the first sleep
    will be enough in almost all cases.

    Generate the same log message at timeout when polling or using
    wait_event_timeout.

    Signed-off-by: Leigh Brown
    Signed-off-by: David S. Miller

    Leigh Brown
     
  • The function needn't to be public, so to make it as static.

    Signed-off-by: Gavin Shan
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • The interface type, which is being traced by "struct be_adapter::
    if_type", isn't used currently. So we can remove that safely
    according to Sathya's comments.

    Signed-off-by: Gavin Shan
    Signed-off-by: David S. Miller

    Gavin Shan
     
  • Use a more current logging style.

    Convert printks to pr_.

    Consolidate multiple printks into a single printk to avoid
    any possible dmesg interleaving. Add a default "event" msg
    in case the listed types are ever expanded.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     
  • This work contains a lightweight BPF-based traffic classifier that can
    serve as a flexible alternative to ematch-based tree classification, i.e.
    now that BPF filter engine can also be JITed in the kernel. Naturally, tc
    actions and policies are supported as well with cls_bpf. Multiple BPF
    programs/filter can be attached for a class, or they can just as well be
    written within a single BPF program, that's really up to the user how he
    wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
    The notion of a BPF program's return/exit codes is being kept as follows:

    0: No match
    -1: Select classid given in "tc filter ..." command
    else: flowid, overwrite the default one

    As a minimal usage example with iproute2, we use a 3 band prio root qdisc
    on a router with sfq each as leave, and assign ssh and icmp bpf-based
    filters to band 1, http traffic to band 2 and the rest to band 3. For the
    first two bands we load the bytecode from a file, in the 2nd we load it
    inline as an example:

    echo 1 > /proc/sys/net/core/bpf_jit_enable

    tc qdisc del dev em1 root
    tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

    tc qdisc add dev em1 parent 1:1 sfq perturb 16
    tc qdisc add dev em1 parent 1:2 sfq perturb 16
    tc qdisc add dev em1 parent 1:3 sfq perturb 16

    tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
    tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
    tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
    tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

    BPF programs can be easily created and passed to tc, either as inline
    'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
    compile opcodes, for example:

    1) People familiar with tcpdump-like filters:

    tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

    2) People that want to low-level program their filters or use BPF
    extensions that lack support by libpcap's compiler:

    bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

    ssh.ops example code:
    ldh [12]
    jne #0x800, drop
    ldb [23]
    jneq #6, drop
    ldh [20]
    jset #0x1fff, drop
    ldxb 4 * ([14] & 0xf)
    ldh [%x + 14]
    jeq #0x16, pass
    ldh [%x + 16]
    jne #0x16, drop
    pass: ret #-1
    drop: ret #0

    It was chosen to load bytecode into tc, since the reverse operation,
    tc filter list dev em1, is then able to show the exact commands again.
    Possible follow-up work could also include a small expression compiler
    for iproute2. Tested with the help of bmon. This idea came up during
    the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
    Eric Dumazet!

    Signed-off-by: Daniel Borkmann
    Cc: Thomas Graf
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • This cleans code a bit and will be useful when allocating buffers in
    other places (like RX path, to avoid skb_copy_from_linear_data_offset).

    Signed-off-by: Rafał Miłecki
    Signed-off-by: David S. Miller

    Rafał Miłecki
     

29 Oct, 2013

21 commits

  • Fix to return -ENOMEM in the memory alloc error handling
    case instead of 0, as done elsewhere in this function.

    Signed-off-by: Wei Yongjun
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Wei Yongjun
     
  • This patch removes the need to keep a zero_base variable in the adapter
    structure. Now we just use two different macros to set the non-zero and
    zero base. This adds to readability and shortens some of the structure
    initialization under 80 columns. The gathering of status for ethtool was
    slightly modified to again better fit into 80 columns and become a bit
    more readable.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Don Skidmore
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Don Skidmore
     
  • This patch adds the extended statistics similar to the ixgbe driver. These
    statistics keep track of how often the busy polling yields, as well as how many
    packets are cleaned or missed by the polling routine.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • This patch enables CONFIG_NET_RX_BUSY_POLL support in the VF code. This enables
    sockets which have enabled the SO_BUSY_POLL socket option to use the
    ndo_busy_poll_recv operation which could result in lower latency, at the cost
    of higher CPU utilization, and increased power usage. This support is similar
    to how the ixgbe driver works.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • Rather than return true/false indicating whether there was budget left, return
    the total packets cleaned. This currently has no use, but will be used in a
    following patch which enables CONFIG_NET_RX_BUSY_POLL support in order to track
    how many packets were cleaned during the busy poll as part of the extended
    statistics.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • This patch adds ixgbevf_rx_skb in line with how ixgbe handles the variations on
    how packets can be received. It will be extended in a following patch for
    CONFIG_NET_RX_BUSY_POLL support.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • This patch removes the unnecessary display of PCIe bandwidth twice. Since the
    ixgbe_check_minimum_link does a better job, and ensures accurate detection on
    even complex chains, this older check is no longer necessary.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • This patch updates the ixgbe_check_minimum_link function to correctly show that
    there is some minor loss of encoding, even though we don't calculate it in the
    max GT/s equation. It is small enough to not bother, but is better to report it
    than not.

    Signed-off-by: Jacob Keller
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • ixgbe_napi_disable_all calls napi_disable on each queue, however the busy
    polling code introduced a local_bh_disable()d context around the napi_disable.
    The original author did not realize that napi_disable might sleep, which would
    cause a sleep while atomic BUG. In addition, on a single processor system, the
    ixgbe_qv_lock_napi loop shouldn't have to mdelay. This patch adds an
    ixgbe_qv_disable along with a new IXGBE_QV_STATE_DISABLED bit, which it uses to
    indicate to the poll and napi routines that the q_vector has been disabled. Now
    the ixgbe_napi_disable_all function will wait until all pending work has been
    finished and prevent any future work from being started.

    Signed-off-by: Jacob Keller
    Cc: Eliezer Tamir
    Cc: Alexander Duyck
    Cc: Hyong-Youb Kim
    Cc: Amir Vadai
    Cc: Dmitry Kravkov
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • napi_disable uses an msleep() call to wait for outstanding napi work to be
    finished after setting the disable bit. It does not always sleep incase there
    was no outstanding work. This resulted in a rare bug in ixgbe_down operation
    where a napi_disable call took place inside of a local_bh_disable()d context.
    In order to enable easier detection of future sleep while atomic BUGs, this
    patch adds a might_sleep() call, so that every use of napi_disable during
    atomic context will be visible.

    Signed-off-by: Jacob Keller
    Cc: Eliezer Tamir
    Cc: Alexander Duyck
    Cc: Hyong-Youb Kim
    Cc: Amir Vadai
    Cc: Dmitry Kravkov
    Tested-by: Phil Schmitt
    Signed-off-by: Jeff Kirsher

    Jacob Keller
     
  • This patch removes the burden from the NIC drivers to check if the
    vxlan driver is enabled in the kernel and also makes available
    the vxlan headrooms to them.

    Signed-off-by: Joseph Gasparakis
    Tested-by: Kavindya Deegala
    Signed-off-by: Jeff Kirsher

    Joseph Gasparakis
     
  • Signed-off-by: Zhi Yong Wu
    Signed-off-by: David S. Miller

    Zhi Yong Wu
     
  • Signed-off-by: Zhi Yong Wu
    Signed-off-by: David S. Miller

    Zhi Yong Wu
     
  • Signed-off-by: Zhi Yong Wu
    Signed-off-by: David S. Miller

    Zhi Yong Wu
     
  • drivers/net/vxlan.c: In function ‘vxlan_sock_add’:
    drivers/net/vxlan.c:2298:11: warning: ‘sock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    drivers/net/vxlan.c:2275:17: note: ‘sock’ was declared here
    LD drivers/net/built-in.o

    Signed-off-by: Zhi Yong Wu
    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Zhi Yong Wu
     
  • UFO as well as UDP_CORK do not respect IP_PMTUDISC_DO and
    IP_PMTUDISC_PROBE well enough.

    UFO enabled packet delivery just appends all frags to the cork and hands
    it over to the network card. So we just deliver non-DF udp fragments
    (DF-flag may get overwritten by hardware or virtual UFO enabled
    interface).

    UDP_CORK does enqueue the data until the cork is disengaged. At this
    point it sets the correct IP_DF and local_df flags and hands it over to
    ip_fragment which in this case will generate an icmp error which gets
    appended to the error socket queue. This is not reflected in the syscall
    error (of course, if UFO is enabled this also won't happen).

    Improve this by checking the pmtudisc flags before appending data to the
    socket and if we still can fit all data in one packet when IP_PMTUDISC_DO
    or IP_PMTUDISC_PROBE is set, only then proceed.

    We use (mtu-fragheaderlen) to check for the maximum length because we
    ensure not to generate a fragment and non-fragmented data does not need
    to have its length aligned on 64 bit boundaries. Also the passed in
    ip_options are already aligned correctly.

    Maybe, we can relax some other checks around ip_fragment. This needs
    more research.

    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • The virtio_net driver's mergeable receive buffer allocator
    uses 4KB packet buffers. For MTU-sized traffic, SKB truesize
    is > 4KB but only ~1500 bytes of the buffer is used to store
    packet data, reducing the effective TCP window size
    substantially. This patch addresses the performance concerns
    with mergeable receive buffers by allocating MTU-sized packet
    buffers using page frag allocators. If more than MAX_SKB_FRAGS
    buffers are needed, the SKB frag_list is used.

    Signed-off-by: Michael Dalton
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Michael Dalton
     
  • The code for privacy extentions is very mature, and making it
    configurable only gives marginal memory/code savings in exchange
    for obfuscation and hard to read code via CPP ifdef'ery.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Alexander Aring says:

    ====================
    6lowpan: trivial changes

    This patch series includes some trivial changes to prepare the 6lowpan stack
    for upcomming patch-series which mainly fix fragmentation according to rfc4944
    and udp handling(which is currently broken).

    Changes since v3:
    - really fix intendation in patch 3/5

    Changes since v2:
    - change intendation in patch 3/5
    - fix typo in 5/5 unecessary -> unnecessary
    - add missing 6lowpan tag in cover-letter
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Alexander Aring
    Reviewed-by: Werner Almesberger
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • This patch removes the assignment of skb->dev. We don't need it here because
    we use the netdev_alloc_skb_ip_align function which already sets the
    skb->dev.

    Signed-off-by: Alexander Aring
    Reviewed-by: Werner Almesberger
    Signed-off-by: David S. Miller

    Alexander Aring