28 Dec, 2016

1 commit

  • Networking stack accelerate vlan tag handling by
    keeping topmost vlan header in skb. This works as
    long as packet remains in OVS datapath. But during
    OVS upcall vlan header is pushed on to the packet.
    When such packet is sent back to OVS datapath, core
    networking stack might not handle it correctly. Following
    patch avoids this issue by accelerating the vlan tag
    during flow key extract. This simplifies datapath by
    bringing uniform packet processing for packets from
    all code paths.

    Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
    CC: Jarno Rajahalme
    CC: Jiri Benc
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     

21 Dec, 2016

1 commit

  • Add a break statement to prevent fall-through from
    OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL. Without the break
    actions setting ethernet addresses fail to validate with log messages
    complaining about invalid tunnel attributes.

    Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
    Signed-off-by: Jarno Rajahalme
    Acked-by: Pravin B Shelar
    Acked-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     

04 Dec, 2016

1 commit

  • Couple conflicts resolved here:

    1) In the MACB driver, a bug fix to properly initialize the
    RX tail pointer properly overlapped with some changes
    to support variable sized rings.

    2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
    overlapping with a reorganization of the driver to support
    ACPI, OF, as well as PCI variants of the chip.

    3) In 'net' we had several probe error path bug fixes to the
    stmmac driver, meanwhile a lot of this code was cleaned up
    and reorganized in 'net-next'.

    4) The cls_flower classifier obtained a helper function in
    'net-next' called __fl_delete() and this overlapped with
    Daniel Borkamann's bug fix to use RCU for object destruction
    in 'net'. It also overlapped with Jiri's change to guard
    the rhashtable_remove_fast() call with a check against
    tc_skip_sw().

    5) In mlx4, a revert bug fix in 'net' overlapped with some
    unrelated changes in 'net-next'.

    6) In geneve, a stale header pointer after pskb_expand_head()
    bug fix in 'net' overlapped with a large reorganization of
    the same code in 'net-next'. Since the 'net-next' code no
    longer had the bug in question, there was nothing to do
    other than to simply take the 'net-next' hunks.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Dec, 2016

1 commit

  • If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it
    means that we still have a reference to the skb. We should free it
    before returning from handle_fragments, as stated in the comment above.

    Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion")
    CC: Florian Westphal
    CC: Pravin B Shelar
    CC: Joe Stringer
    Signed-off-by: Daniele Di Proietto
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Daniele Di Proietto
     

18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

14 Nov, 2016

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains a second batch of Netfilter updates for
    your net-next tree. This includes a rework of the core hook
    infrastructure that improves Netfilter performance by ~15% according to
    synthetic benchmarks. Then, a large batch with ipset updates, including
    a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
    couple of assorted updates.

    Regarding the core hook infrastructure rework to improve performance,
    using this simple drop-all packets ruleset from ingress:

    nft add table netdev x
    nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; }
    nft add rule netdev x y drop

    And generating traffic through Jesper Brouer's
    samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
    option. perf report shows nf_tables calls in its top 10:

    17.30% kpktgend_0 [nf_tables] [k] nft_do_chain
    15.75% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core
    10.39% kpktgend_0 [nf_tables_netdev] [k] nft_do_chain_netdev

    I'm measuring here an improvement of ~15% in performance with this
    patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
    Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.

    This rework contains more specifically, in strict order, these patches:

    1) Remove compile-time debugging from core.

    2) Remove obsolete comments that predate the rcu era. These days it is
    well known that a Netfilter hook always runs under rcu_read_lock().

    3) Remove threshold handling, this is only used by br_netfilter too.
    We already have specific code to handle this from br_netfilter,
    so remove this code from the core path.

    4) Deprecate NF_STOP, as this is only used by br_netfilter.

    5) Place nf_state_hook pointer into xt_action_param structure, so
    this structure fits into one single cacheline according to pahole.
    This also implicit affects nftables since it also relies on the
    xt_action_param structure.

    6) Move state->hook_entries into nf_queue entry. The hook_entries
    pointer is only required by nf_queue(), so we can store this in the
    queue entry instead.

    7) use switch() statement to handle verdict cases.

    8) Remove hook_entries field from nf_hook_state structure, this is only
    required by nf_queue, so store it in nf_queue_entry structure.

    9) Merge nf_iterate() into nf_hook_slow() that results in a much more
    simple and readable function.

    10) Handle NF_REPEAT away from the core, so far the only client is
    nf_conntrack_in() and we can restart the packet processing using a
    simple goto to jump back there when the TCP requires it.
    This update required a second pass to fix fallout, fix from
    Arnd Bergmann.

    11) Set random seed from nft_hash when no seed is specified from
    userspace.

    12) Simplify nf_tables expression registration, in a much smarter way
    to save lots of boiler plate code, by Liping Zhang.

    13) Simplify layer 4 protocol conntrack tracker registration, from
    Davide Caratti.

    14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
    to recent generalization of the socket infrastructure, from Arnd
    Bergmann.

    15) Then, the ipset batch from Jozsef, he describes it as it follows:

    * Cleanup: Remove extra whitespaces in ip_set.h
    * Cleanup: Mark some of the helpers arguments as const in ip_set.h
    * Cleanup: Group counter helper functions together in ip_set.h
    * struct ip_set_skbinfo is introduced instead of open coded fields
    in skbinfo get/init helper funcions.
    * Use kmalloc() in comment extension helper instead of kzalloc()
    because it is unnecessary to zero out the area just before
    explicit initialization.
    * Cleanup: Split extensions into separate files.
    * Cleanup: Separate memsize calculation code into dedicated function.
    * Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
    together.
    * Add element count to hash headers by Eric B Munson.
    * Add element count to all set types header for uniform output
    across all set types.
    * Count non-static extension memory into memsize calculation for
    userspace.
    * Cleanup: Remove redundant mtype_expire() arguments, because
    they can be get from other parameters.
    * Cleanup: Simplify mtype_expire() for hash types by removing
    one level of intendation.
    * Make NLEN compile time constant for hash types.
    * Make sure element data size is a multiple of u32 for the hash set
    types.
    * Optimize hash creation routine, exit as early as possible.
    * Make struct htype per ipset family so nets array becomes fixed size
    and thus simplifies the struct htype allocation.
    * Collapse same condition body into a single one.
    * Fix reported memory size for hash:* types, base hash bucket structure
    was not taken into account.
    * hash:ipmac type support added to ipset by Tomasz Chilinski.
    * Use setup_timer() and mod_timer() instead of init_timer()
    by Muhammad Falak R Wani, individually for the set type families.

    16) Remove useless connlabel field in struct netns_ct, patch from
    Florian Westphal.

    17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify
    {ip,ip6,arp}tables code that uses this.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Nov, 2016

8 commits

  • Allow ARPHRD_NONE interfaces to be added to ovs bridge.

    Based on previous versions by Lorand Jakab and Simon Horman.

    Signed-off-by: Lorand Jakab
    Signed-off-by: Simon Horman
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • It's not allowed to push Ethernet header in front of another Ethernet
    header.

    It's not allowed to pop Ethernet header if there's a vlan tag. This
    preserves the invariant that L3 packet never has a vlan tag.

    Based on previous versions by Lorand Jakab and Simon Horman.

    Signed-off-by: Lorand Jakab
    Signed-off-by: Simon Horman
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • Extend the ovs flow netlink protocol to support L3 packets. Packets without
    OVS_KEY_ATTR_ETHERNET attribute specify L3 packets; for those, the
    OVS_KEY_ATTR_ETHERTYPE attribute is mandatory.

    Push/pop vlan actions are only supported for Ethernet packets.

    Based on previous versions by Lorand Jakab and Simon Horman.

    Signed-off-by: Lorand Jakab
    Signed-off-by: Simon Horman
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • Support receiving, extracting flow key and sending of L3 packets (packets
    without an Ethernet header).

    Note that even after this patch, non-Ethernet interfaces are still not
    allowed to be added to bridges. Similarly, netlink interface for sending and
    receiving L3 packets to/from user space is not in place yet.

    Based on previous versions by Lorand Jakab and Simon Horman.

    Signed-off-by: Lorand Jakab
    Signed-off-by: Simon Horman
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • Update Ethernet header only if there is one.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • We'll need it to alter packets sent to ARPHRD_NONE interfaces.

    Change do_output() to use the actual L2 header size of the packet when
    deciding on the minimum cutlen. The assumption here is that what matters is
    not the output interface hard_header_len but rather the L2 header of the
    particular packet. For example, ARPHRD_NONE tunnels that encapsulate
    Ethernet should get at least the Ethernet header.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • Use a hole in the structure. We support only Ethernet so far and will add
    a support for L2-less packets shortly. We could use a bool to indicate
    whether the Ethernet header is present or not but the approach with the
    mac_proto field is more generic and occupies the same number of bytes in the
    struct, while allowing later extensibility. It also makes the code in the
    next patches more self explaining.

    It would be nice to use ARPHRD_ constants but those are u16 which would be
    waste. Thus define our own constants.

    Another upside of this is that we can overload this new field to also denote
    whether the flow key is valid. This has the advantage that on
    refragmentation, we don't have to reparse the packet but can rely on the
    stored eth.type. This is especially important for the next patches in this
    series - instead of adding another branch for L2-less packets before calling
    ovs_fragment, we can just remove all those branches completely.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • On tx, use hard_header_len while deciding whether to refragment or drop the
    packet. That way, all combinations are calculated correctly:

    * L2 packet going to L2 interface (the L2 header len is subtracted),
    * L2 packet going to L3 interface (the L2 header is included in the packet
    lenght),
    * L3 packet going to L3 interface.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

03 Nov, 2016

1 commit


28 Oct, 2016

3 commits

  • Now genl_register_family() is the only thing (other than the
    users themselves, perhaps, but I didn't find any doing that)
    writing to the family struct.

    In all families that I found, genl_register_family() is only
    called from __init functions (some indirectly, in which case
    I've add __init annotations to clarifly things), so all can
    actually be marked __ro_after_init.

    This protects the data structure from accidental corruption.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Instead of providing macros/inline functions to initialize
    the families, make all users initialize them statically and
    get rid of the macros.

    This reduces the kernel code size by about 1.6k on x86-64
    (with allyesconfig).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Static family IDs have never really been used, the only
    use case was the workaround I introduced for those users
    that assumed their family ID was also their multicast
    group ID.

    Additionally, because static family IDs would never be
    reserved by the generic netlink code, using a relatively
    low ID would only work for built-in families that can be
    registered immediately after generic netlink is started,
    which is basically only the control family (apart from
    the workaround code, which I also had to add code for so
    it would reserve those IDs)

    Thus, anything other than GENL_ID_GENERATE is flawed and
    luckily not used except in the cases I mentioned. Move
    those workarounds into a few lines of code, and then get
    rid of GENL_ID_GENERATE entirely, making it more robust.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

21 Oct, 2016

1 commit

  • geneve:
    - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
    - This one isn't quite as straight-forward as others, could use some
    closer inspection and testing

    macvlan:
    - set min/max_mtu

    tun:
    - set min/max_mtu, remove tun_net_change_mtu

    vxlan:
    - Merge __vxlan_change_mtu back into vxlan_change_mtu
    - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
    change_mtu function
    - This one is also not as straight-forward and could use closer inspection
    and testing from vxlan folks

    bridge:
    - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
    change_mtu function

    openvswitch:
    - set min/max_mtu, remove internal_dev_change_mtu
    - note: max_mtu wasn't checked previously, it's been set to 65535, which
    is the largest possible size supported

    sch_teql:
    - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

    macsec:
    - min_mtu = 0, max_mtu = 65535

    macvlan:
    - min_mtu = 0, max_mtu = 65535

    ntb_netdev:
    - min_mtu = 0, max_mtu = 65535

    veth:
    - min_mtu = 68, max_mtu = 65535

    8021q:
    - min_mtu = 0, max_mtu = 65535

    CC: netdev@vger.kernel.org
    CC: Nicolas Dichtel
    CC: Hannes Frederic Sowa
    CC: Tom Herbert
    CC: Daniel Borkmann
    CC: Alexander Duyck
    CC: Paolo Abeni
    CC: Jiri Benc
    CC: WANG Cong
    CC: Roopa Prabhu
    CC: Pravin B Shelar
    CC: Sabrina Dubroca
    CC: Patrick McHardy
    CC: Stephen Hemminger
    CC: Pravin Shelar
    CC: Maxim Krasnyansky
    Signed-off-by: Jarod Wilson
    Signed-off-by: David S. Miller

    Jarod Wilson
     

20 Oct, 2016

2 commits


14 Oct, 2016

1 commit


13 Oct, 2016

3 commits

  • The internal device does support 802.1AD offloading since 018c1dda5ff1
    ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink
    attributes").

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Acked-by: Eric Garver
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN
    header is not counted in skb->len. It doesn't make sense to subtract it.

    Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Acked-by: Eric Garver
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • This code is called whenever flow key is being extracted from the packet.
    The packet may be as likely vlan tagged as not.

    Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Acked-by: Eric Garver
    Signed-off-by: David S. Miller

    Jiri Benc
     

12 Oct, 2016

1 commit

  • If mpls headers were pushed to a defragmented packet, the refragmentation no
    longer works correctly after 48d2ab609b6b ("net: mpls: Fixups for GSO"). The
    network header has to be shifted after the mpls headers for the
    fragmentation and restored afterwards.

    Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

03 Oct, 2016

2 commits

  • skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper
    instead.

    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in
    openvswitch was changed to have network header pointing to the start of the
    MPLS headers and inner_network_header pointing after the MPLS headers.

    However, key_extract was missed by the mentioned commit, causing incorrect
    headers to be set when a MPLS packet just enters the bridge or after it is
    recirculated.

    Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

21 Sep, 2016

2 commits


19 Sep, 2016

2 commits

  • Instead of using flow stats per NUMA node, use it per CPU. When using
    megaflows, the stats lock can be a bottleneck in scalability.

    On a E5-2690 12-core system, usual throughput went from ~4Mpps to
    ~15Mpps when forwarding between two 40GbE ports with a single flow
    configured on the datapath.

    This has been tested on a system with possible CPUs 0-7,16-23. After
    module removal, there were no corruption on the slab cache.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Cc: pravin shelar
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     
  • On a system with only node 1 as possible, all statistics is going to be
    accounted on node 0 as it will have a single writer.

    However, when getting and clearing the statistics, node 0 is not going
    to be considered, as it's not a possible node.

    Tested that statistics are not zero on a system with only node 1
    possible. Also compile-tested with CONFIG_NUMA off.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thadeu Lima de Souza Cascardo
     

16 Sep, 2016

1 commit

  • The ovs kernel data path currently defers the execution of all
    recirc actions until stack utilization is at a minimum.
    This is too limiting for some packet forwarding scenarios due to
    the small size of the deferred action FIFO (10 entries). For
    example, broadcast traffic sent out more than 10 ports with
    recirculation results in packet drops when the deferred action
    FIFO becomes full, as reported here:

    http://openvswitch.org/pipermail/dev/2016-March/067672.html

    Since the current recursion depth is available (it is already tracked
    by the exec_actions_level pcpu variable), we can use it to determine
    whether to execute recirculation actions immediately (safe when
    recursion depth is low) or defer execution until more stack space is
    available.

    With this change, the deferred action fifo size becomes a non-issue
    for currently failing scenarios because it is no longer used when
    there are three or fewer recursions through ovs_execute_actions().

    Suggested-by: Pravin Shelar
    Signed-off-by: Lance Richardson
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Lance Richardson
     

11 Sep, 2016

1 commit


09 Sep, 2016

1 commit

  • Add support for 802.1ad including the ability to push and pop double
    tagged vlans. Add support for 802.1ad to netlink parsing and flow
    conversion. Uses double nested encap attributes to represent double
    tagged vlan. Inner TPID encoded along with ctci in nested attributes.

    This is based on Thomas F Herbert's original v20 patch. I made some
    small clean ups and bug fixes.

    Signed-off-by: Thomas F Herbert
    Signed-off-by: Eric Garver
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Eric Garver
     

05 Sep, 2016

1 commit

  • When an error occurs during conntrack template creation as part of
    actions validation, we need to free the template. Previously we've been
    using nf_ct_put() to do this, but nf_ct_tmpl_free() is more appropriate.

    Signed-off-by: Joe Stringer
    Signed-off-by: David S. Miller

    Joe Stringer
     

31 Aug, 2016

1 commit

  • As reported by Lennert the MPLS GSO code is failing to properly segment
    large packets. There are a couple of problems:

    1. the inner protocol is not set so the gso segment functions for inner
    protocol layers are not getting run, and

    2 MPLS labels for packets that use the "native" (non-OVS) MPLS code
    are not properly accounted for in mpls_gso_segment.

    The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
    to call the gso segment functions for the higher layer protocols. That
    means skb_mac_gso_segment is called twice -- once with the network
    protocol set to MPLS and again with the network protocol set to the
    inner protocol.

    This patch sets the inner skb protocol addressing item 1 above and sets
    the network_header and inner_network_header to mark where the MPLS labels
    start and end. The MPLS code in OVS is also updated to set the two
    network markers.

    >From there the MPLS GSO code uses the difference between the network
    header and the inner network header to know the size of the MPLS header
    that was pushed. It then pulls the MPLS header, resets the mac_len and
    protocol for the inner protocol and then calls skb_mac_gso_segment
    to segment the skb.

    Afterward the inner protocol segmentation is done the skb protocol
    is set to mpls for each segment and the network and mac headers
    restored.

    Reported-by: Lennert Buytenhek
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

11 Aug, 2016

1 commit

  • The creation of a tunnel vport (geneve, gre, vxlan) brings up a
    corresponding netdev, a multi-step operation which can fail.

    For example, changing a vxlan vport's netdev state to 'up' binds the
    vport's socket to a UDP port - if the binding fails (e.g. due to the
    port being in use), the error is currently ignored giving the
    appearance that the tunnel vport creation completed successfully.

    Signed-off-by: Martynas Pumputis
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Martynas Pumputis
     

06 Aug, 2016

1 commit

  • net_device->ndo_set_rx_headroom (introduced in
    871b642adebe300be2e50aa5f65a418510f636ec) says

    "Setting a negtaive value reset the rx headroom
    to the default value".

    It seems that the OVS implementation in
    3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets
    dev->needed_headroom unconditionally.

    This doesn't have an immediate effect, but can mess up later
    LL_RESERVED_SPACE calculations, such as done in
    net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
    from a skb_panic raised there after the length calculations had given
    the wrong result.

    Note the other current users of this interface
    (drivers/net/tun.c:tun_set_headroom and
    drivers/net/veth.c:veth_set_rx_headroom) are both checking this
    correctly thus need no modification.

    Thanks to Ben for some pointers from the crash dumps!

    Cc: Benjamin Poirier
    Cc: Paolo Abeni
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414
    Signed-off-by: Ian Wienand
    Signed-off-by: David S. Miller

    Ian Wienand
     

04 Aug, 2016

1 commit

  • ovs_ct_find_existing() issues a warning if an existing conntrack entry
    classified as IP_CT_NEW is found, with the premise that this should
    not happen. However, a newly confirmed, non-expected conntrack entry
    remains IP_CT_NEW as long as no reply direction traffic is seen. This
    has resulted into somewhat confusing kernel log messages. This patch
    removes this check and warning.

    Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.")
    Suggested-by: Joe Stringer
    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: David S. Miller

    Jarno Rajahalme