25 Jul, 2016

1 commit

  • Following the work that have been done on offloading classifiers like u32
    and flower, now the match-all classifier hw offloading is possible. if
    the interface supports tc offloading.

    To control the offloading, two tc flags have been introduced: skip_sw and
    skip_hw. Typical usage:

    tc filter add dev eth25 parent ffff: \
    matchall skip_sw \
    action mirred egress mirror \
    dev eth27

    Signed-off-by: Yotam Gigi
    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Yotam Gigi
     

24 Jul, 2016

1 commit


20 Jul, 2016

1 commit


17 Jul, 2016

1 commit

  • macsec can't cope with mtu frames which need vlan tag insertion, and
    vlan device set the default mtu equal to the underlying dev's one.
    By default vlan over macsec devices use invalid mtu, dropping
    all the large packets.
    This patch adds a netif helper to check if an upper vlan device
    needs mtu reduction. The helper is used during vlan devices
    initialization to set a valid default and during mtu updating to
    forbid invalid, too bit, mtu values.
    The helper currently only check if the lower dev is a macsec device,
    if we get more users, we need to update only the helper (possibly
    reserving an additional IFF bit).

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

06 Jul, 2016

2 commits


05 Jul, 2016

1 commit

  • Add functions that iterate over lower devices and find port device.
    As a dependency add netdev_for_each_all_lower_dev and
    netdev_for_each_all_lower_dev_rcu macro with
    netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.

    Also, add functions to return mlxsw struct according to lower device
    found and mlxsw_port struct with a reference to lower device.

    Signed-off-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Jiri Pirko
     

01 Jul, 2016

1 commit

  • This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
    will be triggered when tx_queue_len. It could be used by net device
    who want to do some processing at that time. An example is tun who may
    want to resize tx array when tx_queue_len is changed.

    Cc: John Fastabend
    Signed-off-by: Jason Wang
    Acked-by: John Fastabend
    Signed-off-by: David S. Miller

    Jason Wang
     

18 Jun, 2016

2 commits

  • Now that we have all the drivers using udp_tunnel_get_rx_ports,
    ndo_add_udp_enc_rx_port, and ndo_del_udp_enc_rx_port we can drop the
    function calls that were specific to VXLAN and GENEVE.

    Signed-off-by: Alexander Duyck
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch merges the notifiers for VXLAN and GENEVE into a single UDP
    tunnel notifier. The idea is that we will want to only have to make one
    notifier call to receive the list of ports for VXLAN and GENEVE tunnels
    that need to be offloaded.

    In addition we add a new set of ndo functions named ndo_udp_tunnel_add and
    ndo_udp_tunnel_del that are meant to allow us to track the tunnel meta-data
    such as port and address family as tunnels are added and removed. The
    tunnel meta-data is now transported in a structure named udp_tunnel_info
    which for now carries the type, address family, and port number. In the
    future this could be updated so that we can include a tuple of values
    including things such as the destination IP address and other fields.

    I also ended up going with a naming scheme that consisted of using the
    prefix udp_tunnel on function names. I applied this to the notifier and
    ndo ops as well so that it hopefully points to the fact that these are
    primarily used in the udp_tunnel functions.

    Signed-off-by: Alexander Duyck
    Acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Alexander Duyck
     

16 Jun, 2016

2 commits

  • This patch introduces neighbour discovery ops callback structure. The
    idea is to separate the handling for 6LoWPAN into the 6lowpan module.

    These callback offers 6lowpan different handling, such as 802.15.4 short
    address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
    over 6LoWPANs).

    Cc: David S. Miller
    Cc: Alexey Kuznetsov
    Cc: James Morris
    Cc: Hideaki YOSHIFUJI
    Cc: Patrick McHardy
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     
  • This patch will introduce a 6lowpan neighbour private data. Like the
    interface private data we handle private data for generic 6lowpan and
    for link-layer specific 6lowpan.

    The current first use case if to save the short address for a 802.15.4
    6lowpan neighbour.

    Cc: David S. Miller
    Reviewed-by: Stefan Schmidt
    Acked-by: YOSHIFUJI Hideaki
    Signed-off-by: Alexander Aring
    Signed-off-by: David S. Miller

    Alexander Aring
     

13 Jun, 2016

1 commit

  • sch_atm returns this when TC_ACT_SHOT classification occurs.

    But all other schedulers that use tc_classify
    (htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS
    in this case so just do that in atm.

    BATMAN uses it as an intermediate return value to signal
    forwarding vs. buffering, but it did not return POLICED to
    callers outside of BATMAN.

    Reviewed-by: Sven Eckelmann
    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

11 Jun, 2016

1 commit

  • Respect the stack's xmit_recursion limit for calls into dev_queue_xmit().
    Currently, they are not handeled by the limiter when attached to clsact's
    egress parent, for example, and a buggy program redirecting it to the
    same device again could run into stack overflow eventually. It would be
    good if we could notify an admin to give him a chance to react. We reuse
    xmit_recursion instead of having one private to eBPF, so that the stack's
    current recursion depth will be taken into account as well. Follow-up to
    commit 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") and
    27b29f63058d ("bpf: add bpf_redirect() helper").

    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

10 Jun, 2016

1 commit


09 Jun, 2016

1 commit

  • "make htmldocs" complains otherwise:

    .//net/core/gen_stats.c:168: warning: No description found for parameter 'running'
    .//include/linux/netdevice.h:1867: warning: No description found for parameter 'qdisc_running_key'

    Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
    Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
    Signed-off-by: Eric Dumazet
    Reported-by: kbuild test robot
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Jun, 2016

1 commit

  • Instead of using a single bit (__QDISC___STATE_RUNNING)
    in sch->__state, use a seqcount.

    This adds lockdep support, but more importantly it will allow us
    to sample qdisc/class statistics without having to grab qdisc root lock.

    Signed-off-by: Eric Dumazet
    Cc: Cong Wang
    Cc: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Eric Dumazet
     

04 Jun, 2016

1 commit

  • SCTP has this pecualiarity that its packets cannot be just segmented to
    (P)MTU. Its chunks must be contained in IP segments, padding respected.
    So we can't just generate a big skb, set gso_size to the fragmentation
    point and deliver it to IP layer.

    This patch takes a different approach. SCTP will now build a skb as it
    would be if it was received using GRO. That is, there will be a cover
    skb with protocol headers and children ones containing the actual
    segments, already segmented to a way that respects SCTP RFCs.

    With that, we can tell skb_segment() to just split based on frag_list,
    trusting its sizes are already in accordance.

    This way SCTP can benefit from GSO and instead of passing several
    packets through the stack, it can pass a single large packet.

    v2:
    - Added support for receiving GSO frames, as requested by Dave Miller.
    - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
    - Added heuristics similar to what we have in TCP for not generating
    single GSO packets that fills cwnd.
    v3:
    - consider sctphdr size in skb_gso_transport_seglen()
    - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for
    unsupported GSO")

    Signed-off-by: Marcelo Ricardo Leitner
    Tested-by: Xin Long
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

21 May, 2016

1 commit

  • This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
    SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
    NETIF_F_GSO_IPXIP6. These are used to described IP in IP
    tunnel and what the outer protocol is. The inner protocol
    can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
    SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
    are removed (these are both instances of SKB_GSO_IPXIP4).
    SKB_GSO_IPXIP6 will be used when support for GSO with IP
    encapsulation over IPv6 is added.

    Signed-off-by: Tom Herbert
    Acked-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Tom Herbert
     

17 May, 2016

1 commit


12 May, 2016

1 commit

  • Currently the VRF driver uses the rx_handler to switch the skb device
    to the VRF device. Switching the dev prior to the ip / ipv6 layer
    means the VRF driver has to duplicate IP/IPv6 processing which adds
    overhead and makes features such as retaining the ingress device index
    more complicated than necessary.

    This patch moves the hook to the L3 layer just after the first NF_HOOK
    for PRE_ROUTING. This location makes exposing the original ingress device
    trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
    in the future.

    dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
    with the switched device through the packet taps to maintain current
    behavior (tcpdump can be used on either the vrf device or the enslaved
    devices).

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

05 May, 2016

2 commits

  • previous patches removed all direct accesses to dev->trans_start,
    so change the netif_trans_update helper to update trans_start of
    netdev queue 0 instead and then remove trans_start from struct net_device.

    AFAICS a lot of the netif_trans_update() invocations are now useless
    because they occur in ndo_start_xmit and driver doesn't set LLTX
    (i.e. stack already took care of the update).

    As I can't test any of them it seems better to just leave them alone.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     
  • trans_start exists twice:
    - as member of net_device (legacy)
    - as member of netdev_queue

    In order to get rid of the legacy case, add a helper for the
    dev->trans_update (this patch), then convert spots that do

    dev->trans_start = jiffies

    to use this helper (next patch).

    This would then allow us to change the helper so that it updates the
    trans_stamp of netdev queue 0 instead.

    Signed-off-by: Florian Westphal
    Signed-off-by: David S. Miller

    Florian Westphal
     

04 May, 2016

1 commit


03 May, 2016

1 commit

  • - trans_timeout is incremented when tx queue timed out (tx watchdog).
    - tx_maxrate is set via sysfs

    Moving tx_maxrate to read-mostly part shrinks the struct by 64 bytes.
    While at it, also move trans_timeout (it is out-of-place in the
    'write-mostly' part).

    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Florian Westphal
     

30 Apr, 2016

1 commit


29 Apr, 2016

1 commit

  • Fix casting in net_gso_ok. Otherwise the shift on
    gso_type << NETIF_F_GSO_SHIFT may hit the 32th bit and make it look like
    a INT_MIN, which is then promoted from signed to uint64 which is
    0xffffffff80000000, resulting in wrong behavior when it is and'ed with
    the feature itself, as in:

    This test app:
    #include
    #include

    int main(int argc, char **argv)
    {
    uint64_t feature1;
    uint64_t feature2;
    int gso_type = 1 << 15;

    feature1 = gso_type << 16;
    feature2 = (uint64_t)gso_type << 16;
    printf("%lx %lx\n", feature1, feature2);

    return 0;
    }

    Gives:
    ffffffff80000000 80000000

    So that this:
    return (features & feature) == feature;
    Actually works on more bits than expected and invalid ones.

    Fix is to promote it earlier.

    Issue noted while rebasing SCTP GSO patch but posting separetely as
    someone else may experience this meanwhile.

    Signed-off-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller

    Marcelo Ricardo Leitner
     

28 Apr, 2016

1 commit

  • sd->input_queue_head is incremented for each processed packet
    in process_backlog(), and read from other cpus performing
    Out Of Order avoidance in get_rps_cpu()

    Moving this field in a separate cache line keeps it mostly
    hot for the cpu in process_backlog(), as other cpus will
    only read it.

    In a stress test, process_backlog() was consuming 6.80 % of cpu cycles,
    and the patch reduced the cost to 0.65 %

    Signed-off-by: Eric Dumazet
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Apr, 2016

1 commit


22 Apr, 2016

2 commits

  • Equivalent to "vxlan: break dependency with netdev drivers", don't
    autoload geneve module in case the driver is loaded. Instead make the
    coupling weaker by using netdevice notifiers as proxy.

    Cc: Jesse Gross
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     
  • Currently all drivers depend and autoload the vxlan module because how
    vxlan_get_rx_port is linked into them. Remove this dependency:

    By using a new event type in the netdevice notifier call chain we proxy
    the request from the drivers to flush and resetup the vxlan ports not
    directly via function call but by the already existing netdevice
    notifier call chain.

    I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so.
    We don't need to save those ids, as the event type field is an unsigned
    long and using specialized event types for this purpose seemed to be a
    more elegant way. This also comes in beneficial if in future we want to
    add offloading knobs for vxlan.

    Cc: Jesse Gross
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

15 Apr, 2016

3 commits

  • This patch adds support for something I am referring to as GSO partial.
    The basic idea is that we can support a broader range of devices for
    segmentation if we use fixed outer headers and have the hardware only
    really deal with segmenting the inner header. The idea behind the naming
    is due to the fact that everything before csum_start will be fixed headers,
    and everything after will be the region that is handled by hardware.

    With the current implementation it allows us to add support for the
    following GSO types with an inner TSO_MANGLEID or TSO6 offload:
    NETIF_F_GSO_GRE
    NETIF_F_GSO_GRE_CSUM
    NETIF_F_GSO_IPIP
    NETIF_F_GSO_SIT
    NETIF_F_UDP_TUNNEL
    NETIF_F_UDP_TUNNEL_CSUM

    In the case of hardware that already supports tunneling we may be able to
    extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
    hardware can support updating inner IPv4 headers.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch does two things.

    First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field. As
    a result we should now be able to aggregate flows that were converted from
    IPv6 to IPv4. In addition this allows us more flexibility for future
    implementations of segmentation as we may be able to use a fixed IP ID when
    segmenting the flow.

    The second thing this does is that it places limitations on the outer IPv4
    ID header in the case of tunneled frames. Specifically it forces the IP ID
    to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
    This way we can avoid creating overlapping series of IP IDs that could
    possibly be fragmented if the frame goes through GRO and is then
    resegmented via GSO.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch adds support for TSO using IPv4 headers with a fixed IP ID
    field. This is meant to allow us to do a lossless GRO in the case of TCP
    flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
    headers.

    In addition I am adding a feature that for now I am referring to TSO with
    IP ID mangling. Basically when this flag is enabled the device has the
    option to either output the flow with incrementing IP IDs or with a fixed
    IP ID regardless of what the original IP ID ordering was. This is useful
    in cases where the DF bit is set and we do not care if the original IP ID
    value is maintained.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

14 Apr, 2016

2 commits

  • After introduction of ndo_features_check(), we believe that very
    specific checks for rare features should not be done in core
    networking stack.

    No driver uses gso_min_segs yet, so we revert this feature and save
    few instructions per tx packet in fast path.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Sometimes gcc mysteriously doesn't inline
    very small functions we expect to be inlined. See
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
    Arguably, gcc should do better, but gcc people aren't willing
    to invest time into it, asking to use __always_inline instead.

    With this .config:
    http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
    the following functions get deinlined many times.

    netif_tx_stop_queue: 207 copies, 590 calls:
    55 push %rbp
    48 89 e5 mov %rsp,%rbp
    f0 80 8f e0 01 00 00 01 lock orb $0x1,0x1e0(%rdi)
    5d pop %rbp
    c3 retq

    netif_tx_start_queue: 47 copies, 111 calls
    55 push %rbp
    48 89 e5 mov %rsp,%rbp
    f0 80 a7 e0 01 00 00 fe lock andb $0xfe,0x1e0(%rdi)
    5d pop %rbp
    c3 retq

    sock_hold: 39 copies, 124 calls
    55 push %rbp
    48 89 e5 mov %rsp,%rbp
    f0 ff 87 80 00 00 00 lock incl 0x80(%rdi)
    5d pop %rbp
    c3 retq

    __sock_put: 6 copies, 13 calls
    55 push %rbp
    48 89 e5 mov %rsp,%rbp
    f0 ff 8f 80 00 00 00 lock decl 0x80(%rdi)
    5d pop %rbp
    c3 retq

    This patch fixes this via s/inline/__always_inline/.

    Code size decrease after the patch is ~2.5k:

    text data bss dec hex filename
    56719876 56364551 36196352 149280779 8e5d80b vmlinux_before
    56717440 56364551 36196352 149278343 8e5ce87 vmlinux

    Signed-off-by: Denys Vlasenko
    CC: David S. Miller
    CC: linux-kernel@vger.kernel.org
    CC: netdev@vger.kernel.org
    CC: netfilter-devel@vger.kernel.org
    Signed-off-by: David S. Miller

    Denys Vlasenko
     

10 Apr, 2016

1 commit


08 Apr, 2016

2 commits

  • This patch fixes an issue I found in which we were dropping frames if we
    had enabled checksums on GRE headers that were encapsulated by either FOU
    or GUE. Without this patch I was barely able to get 1 Gb/s of throughput.
    With this patch applied I am now at least getting around 6 Gb/s.

    The issue is due to the fact that with FOU or GUE applied we do not provide
    a transport offset pointing to the GRE header, nor do we offload it in
    software as the GRE header is completely skipped by GSO and treated like a
    VXLAN or GENEVE type header. As such we need to prevent the stack from
    generating it and also prevent GRE from generating it via any interface we
    create.

    Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE")
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Now that the UDP encapsulation GRO functions have been moved to the UDP
    socket we not longer need the udp_offload insfrastructure so removing it.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

24 Mar, 2016

1 commit

  • Pull networking bugfixes from David Miller:
    "Several bug fixes rolling in, some for changes introduced in this
    merge window, and some for problems that have existed for some time:

    1) Fix prepare_to_wait() handling in AF_VSOCK, from Claudio Imbrenda.

    2) The new DST_CACHE should be a silent config option, from Dave
    Jones.

    3) inet_current_timestamp() unintentionally truncates timestamps to
    16-bit, from Deepa Dinamani.

    4) Missing reference to netns in ppp, from Guillaume Nault.

    5) Free memory reference in hv_netvsc driver, from Haiyang Zhang.

    6) Missing kernel doc documentation for function arguments in various
    spots around the networking, from Luis de Bethencourt.

    7) UDP stopped receiving broadcast packets properly, due to
    overzealous multicast checks, fix from Paolo Abeni"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits)
    net: ping: make ping_v6_sendmsg static
    hv_netvsc: Fix the order of num_sc_offered decrement
    net: Fix typos and whitespace.
    hv_netvsc: Fix the array sizes to be max supported channels
    hv_netvsc: Fix accessing freed memory in netvsc_change_mtu()
    ppp: take reference on channels netns
    net: Reset encap_level to avoid resetting features on inner IP headers
    net: mediatek: fix checking for NULL instead of IS_ERR() in .probe
    net: phy: at803x: Request 'reset' GPIO only for AT8030 PHY
    at803x: fix reset handling
    AF_VSOCK: Shrink the area influenced by prepare_to_wait
    Revert "vsock: Fix blocking ops call in prepare_to_wait"
    macb: fix PHY reset
    ipv4: initialize flowi4_flags before calling fib_lookup()
    fsl/fman: Workaround for Errata A-007273
    ipv4: fix broadcast packets reception
    net: hns: bug fix about the overflow of mss
    net: hns: adds limitation for debug port mtu
    net: hns: fix the bug about mtu setting
    net: hns: fixes a bug of RSS
    ...

    Linus Torvalds