20 Mar, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support more Realtek wireless chips, from Jes Sorenson.

    2) New BPF types for per-cpu hash and arrap maps, from Alexei
    Starovoitov.

    3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

    4) Allow the use of SO_REUSEPORT in order to do per-thread processing
    of incoming TCP/UDP connections. The muxing can be done using a
    BPF program which hashes the incoming packet. From Craig Gallek.

    5) Add a multiplexer for TCP streams, to provide a messaged based
    interface. BPF programs can be used to determine the message
    boundaries. From Tom Herbert.

    6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

    7) Avoid factorial complexity when taking down an inetdev interface
    with lots of configured addresses. We were doing things like
    traversing the entire address less for each address removed, and
    flushing the entire netfilter conntrack table for every address as
    well.

    8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

    9) Allow offloading u32 classifiers to hardware, and implement for
    ixgbe, from John Fastabend.

    10) Allow configuring IRQ coalescing parameters on a per-queue basis,
    from Kan Liang.

    11) Extend ethtool so that larger link mode masks can be supported.
    From David Decotigny.

    12) Introduce devlink, which can be used to configure port link types
    (ethernet vs Infiniband, etc.), port splitting, and switch device
    level attributes as a whole. From Jiri Pirko.

    13) Hardware offload support for flower classifiers, from Amir Vadai.

    14) Add "Local Checksum Offload". Basically, for a tunneled packet
    the checksum of the outer header is 'constant' (because with the
    checksum field filled into the inner protocol header, the payload
    of the outer frame checksums to 'zero'), and we can take advantage
    of that in various ways. From Edward Cree"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
    bonding: fix bond_get_stats()
    net: bcmgenet: fix dma api length mismatch
    net/mlx4_core: Fix backward compatibility on VFs
    phy: mdio-thunder: Fix some Kconfig typos
    lan78xx: add ndo_get_stats64
    lan78xx: handle statistics counter rollover
    RDS: TCP: Remove unused constant
    RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
    net: smc911x: convert pxa dma to dmaengine
    team: remove duplicate set of flag IFF_MULTICAST
    bonding: remove duplicate set of flag IFF_MULTICAST
    net: fix a comment typo
    ethernet: micrel: fix some error codes
    ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
    bpf, dst: add and use dst_tclassid helper
    bpf: make skb->tc_classid also readable
    net: mvneta: bm: clarify dependencies
    cls_bpf: reset class and reuse major in da
    ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
    ldmvsw: Add ldmvsw.c driver code
    ...

    Linus Torvalds
     

19 Mar, 2016

3 commits

  • eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded
    value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX
    for this, which makes the code a bit more generic and allows to remove
    BPF_TUNLEN_MAX from eBPF code.

    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Currently output of MPLS packets on tunnel vports is not allowed by Open
    vSwitch. This is because historically encapsulation was done in such a way
    that the inner_protocol field of the skb needed to hold the inner protocol
    for both MPLS and tunnel encapsulation in order for GSO segmentation to be
    performed correctly.

    Since b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of
    vport") Open vSwitch makes use of lwt to output to tunnel netdevs which
    perform encapsulation. As no drivers expose support for MPLS offloads this
    means that GSO packets are segmented in software by validate_xmit_skb(),
    which is called from __dev_queue_xmit(), before tunnel encapsulation occurs.
    This means that the inner protocol of MPLS is no longer needed by the time
    encapsulation occurs and the contention on the inner_protocol field of the
    skb no longer occurs.

    Thus it is now safe to output MPLS to tunnel vports.

    Signed-off-by: Simon Horman
    Reviewed-by: Jesse Gross
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Signed-off-by: Fengguang Wu
    Signed-off-by: David S. Miller

    Wu Fengguang
     

18 Mar, 2016

1 commit


15 Mar, 2016

8 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS/OVS updates for net-next

    The following patchset contains Netfilter/IPVS fixes and OVS NAT
    support, more specifically this batch is composed of:

    1) Fix a crash in ipset when performing a parallel flush/dump with
    set:list type, from Jozsef Kadlecsik.

    2) Make sure NFACCT_FILTER_* netlink attributes are in place before
    accessing them, from Phil Turnbull.

    3) Check return error code from ip_vs_fill_iph_skb_off() in IPVS SIP
    helper, from Arnd Bergmann.

    4) Add workaround to IPVS to reschedule existing connections to new
    destination server by dropping the packet and wait for retransmission
    of TCP syn packet, from Julian Anastasov.

    5) Allow connection rescheduling in IPVS when in CLOSE state, also
    from Julian.

    6) Fix wrong offset of SIP Call-ID in IPVS helper, from Marco Angaroni.

    7) Validate IPSET_ATTR_ETHER netlink attribute length, from Jozsef.

    8) Check match/targetinfo netlink attribute size in nft_compat,
    patch from Florian Westphal.

    9) Check for integer overflow on 32-bit systems in x_tables, from
    Florian Westphal.

    Several patches from Jarno Rajahalme to prepare the introduction of
    NAT support to OVS based on the Netfilter infrastructure:

    10) Schedule IP_CT_NEW_REPLY definition for removal in
    nf_conntrack_common.h.

    11) Simplify checksumming recalculation in nf_nat.

    12) Add comments to the openvswitch conntrack code, from Jarno.

    13) Update the CT state key only after successful nf_conntrack_in()
    invocation.

    14) Find existing conntrack entry after upcall.

    15) Handle NF_REPEAT case due to templates in nf_conntrack_in().

    16) Call the conntrack helper functions once the conntrack has been
    confirmed.

    17) And finally, add the NAT interface to OVS.

    The batch closes with:

    18) Cleanup to use spin_unlock_wait() instead of
    spin_lock()/spin_unlock(), from Nicholas Mc Guire.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Extend OVS conntrack interface to cover NAT. New nested
    OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
    A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
    If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
    attributes, new (non-committed/non-confirmed) connections are mangled
    according to the rest of the nested attributes.

    The corresponding OVS userspace patch series includes test cases (in
    tests/system-traffic.at) that also serve as example uses.

    This work extends on a branch by Thomas Graf at
    https://github.com/tgraf/ovs/tree/nat.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Thomas Graf
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • There is no need to help connections that are not confirmed, so we can
    delay helping new connections to the time when they are confirmed.
    This change is needed for NAT support, and having this as a separate
    patch will make the following NAT patch a bit easier to review.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Repeat the nf_conntrack_in() call when it returns NF_REPEAT. This
    avoids dropping a SYN packet re-opening an existing TCP connection.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Add a new function ovs_ct_find_existing() to find an existing
    conntrack entry for which this packet was already applied to. This is
    only to be called when there is evidence that the packet was already
    tracked and committed, but we lost the ct reference due to an
    userspace upcall.

    ovs_ct_find_existing() is called from skb_nfct_cached(), which can now
    hide the fact that the ct reference may have been lost due to an
    upcall. This allows ovs_ct_commit() to be simplified.

    This patch is needed by later "openvswitch: Interface with NAT" patch,
    as we need to be able to pass the packet through NAT using the
    original ct reference also after the reference is lost after an
    upcall.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Only a successful nf_conntrack_in() call can effect a connection state
    change, so it suffices to update the key only after the
    nf_conntrack_in() returns.

    This change is needed for the later NAT patches.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • This makes the code easier to understand and the following patches
    more focused.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     
  • Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
    not make sense. This allows the definition of IP_CT_NUMBER to be
    simplified as well.

    Signed-off-by: Jarno Rajahalme
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     

14 Mar, 2016

1 commit

  • When we want to change a flow using netlink, we have to identify it to
    be able to perform a lookup. Both the flow key and unique flow ID
    (ufid) are valid identifiers, but we always have to specify the flow
    key in the netlink message. When both attributes are there, the ufid
    is used. The flow key is used to validate the actions provided by
    the userland.

    This commit allows to use the ufid without having to provide the flow
    key, as it is already done in the netlink 'flow get' and 'flow del'
    path. The flow key remains mandatory when an action is provided.

    Signed-off-by: Samuel Gauthier
    Reviewed-by: Simon Horman
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Samuel Gauthier
     

02 Mar, 2016

1 commit

  • This patch implements bookkeeping support to compute the maximum
    headroom for all the devices in each datapath. When said value
    changes, the underlying devs are notified via the
    ndo_set_rx_headroom method.

    This also increases the internal vports xmit performance.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

23 Feb, 2016

1 commit


20 Feb, 2016

2 commits

  • Replace individual implementations with the recently introduced
    skb_postpush_rcsum() helper.

    Signed-off-by: Daniel Borkmann
    Acked-by: Tom Herbert
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • the commit 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be
    correctly controlled.") changed the default xmit checksum setting
    for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not
    set into external UDP header.
    This commit changes the rx checksum setting for both lwt vxlan/geneve
    devices created by openvswitch accordingly, so that lwt over ipv6
    tunnel pairs are again able to communicate with default values.

    Signed-off-by: Paolo Abeni
    Acked-by: Jiri Benc
    Acked-by: Jesse Gross
    Signed-off-by: David S. Miller

    Paolo Abeni
     

19 Feb, 2016

2 commits


17 Feb, 2016

1 commit

  • In case of UDP traffic with datagram length
    below MTU this give about 2% performance increase
    when tunneling over ipv4 and about 60% when tunneling
    over ipv6

    Signed-off-by: Paolo Abeni
    Suggested-and-acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     

15 Feb, 2016

1 commit


11 Feb, 2016

1 commit

  • Operations with the GENL_ADMIN_PERM flag fail permissions checks because
    this flag means we call netlink_capable, which uses the init user ns.

    Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
    which should be allowed inside a user namespace.

    The motivation for this is to be able to run openvswitch in unprivileged
    containers. I've tested this and it seems to work, but I really have no
    idea about the security consequences of this patch, so thoughts would be
    much appreciated.

    v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
    v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
    massive one

    Reported-by: James Page
    Signed-off-by: Tycho Andersen
    CC: Eric Biederman
    CC: Pravin Shelar
    CC: Justin Pettit
    CC: "David S. Miller"
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tycho Andersen
     

10 Feb, 2016

1 commit

  • Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
    transmit vxlan packets of any size, constrained only by the ability to
    send out the resulting packets. 4.3 introduced netdevs corresponding
    to tunnel vports. These netdevs have an MTU, which limits the size of
    a packet that can be successfully encapsulated. The default MTU
    values are low (1500 or less), which is awkwardly small in the context
    of physical networks supporting jumbo frames, and leads to a
    conspicuous change in behaviour for userspace.

    Instead, set the MTU on openvswitch-created netdevs to be the relevant
    maximum (i.e. the maximum IP packet size minus any relevant overhead),
    effectively restoring the behaviour prior to 4.3.

    Signed-off-by: David Wragg
    Signed-off-by: David S. Miller

    David Wragg
     

19 Jan, 2016

1 commit

  • It was seen that defective configurations of openvswitch could overwrite
    the STACK_END_MAGIC and cause a hard crash of the kernel because of too
    many recursions within ovs.

    This problem arises due to the high stack usage of openvswitch. The rest
    of the kernel is fine with the current limit of 10 (RECURSION_LIMIT).

    We use the already existing recursion counter in ovs_execute_actions to
    implement an upper bound of 5 recursions.

    Cc: Pravin Shelar
    Cc: Simon Horman
    Cc: Eric Dumazet
    Cc: Simon Horman
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

16 Jan, 2016

1 commit

  • Skb_gso_segment() uses skb control block during segmentation.
    This patch adds 32-bytes room for previous control block which
    will be copied into all resulting segments.

    This patch fixes kernel crash during fragmenting forwarded packets.
    Fragmentation requires valid IP CB in skb for clearing ip options.
    Also patch removes custom save/restore in ovs code, now it's redundant.

    Signed-off-by: Konstantin Khlebnikov
    Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

11 Jan, 2016

3 commits

  • commit be4ace6e6b1b ("openvswitch: Move dev pointer into vport itself")

    The commit above added @dev and moved @rcu to the bottom of struct
    vport, but the change was not reflected in the kernel doc. So let's
    update the kernel doc as well.

    Signed-off-by: Jean Sacren
    Cc: Thomas Graf
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jean Sacren
     
  • commit 6b001e682e90 ("openvswitch: Use Geneve device.")

    The commit above introduced 'port_no' as the name for the member of
    struct geneve_port. The correct name should be 'dst_port' as described
    in the kernel doc. Let's fix that member name and all the pertinent
    instances so that both doc and code would be consistent.

    Signed-off-by: Jean Sacren
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jean Sacren
     
  • commit 6b001e682e90 ("openvswitch: Use Geneve device.")

    The commit above deleted the only call site of ovs_tunnel_route_lookup()
    and now that function is not used any more. So let's delete the function
    definition as well.

    Signed-off-by: Jean Sacren
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Jean Sacren
     

01 Jan, 2016

1 commit


30 Dec, 2015

1 commit

  • Commit 5b48bb8506c5 ("openvswitch: Fix helper reference leak") fixed a
    reference leak on helper objects, but inadvertently introduced a leak on
    the ct template.

    Previously, ct_info.ct->general.use was initialized to 0 by
    nf_ct_tmpl_alloc() and only incremented when ovs_ct_copy_action()
    returned successful. If an error occurred while adding the helper or
    adding the action to the actions buffer, the __ovs_ct_free_action()
    cleanup would use nf_ct_put() to free the entry; However, this relies on
    atomic_dec_and_test(ct_info.ct->general.use). This reference must be
    incremented first, or nf_ct_put() will never free it.

    Fix the issue by acquiring a reference to the template immediately after
    allocation.

    Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
    Fixes: 5b48bb8506c5 ("openvswitch: Fix helper reference leak")
    Signed-off-by: Joe Stringer
    Signed-off-by: David S. Miller

    Joe Stringer
     

19 Dec, 2015

2 commits

  • In a set action tunnel attributes should be encoded in a
    nested action.

    I noticed this because ovs-dpctl was reporting an error
    when dumping flows due to the incorrect encoding of tunnel attributes
    in a set action.

    Fixes: fc4099f17240 ("openvswitch: Fix egress tunnel info.")
    Signed-off-by: Simon Horman
    Signed-off-by: David S. Miller

    Simon Horman
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains the first batch of Netfilter updates for
    the upcoming 4.5 kernel. This batch contains userspace netfilter header
    compilation fixes, support for packet mangling in nf_tables, the new
    tracing infrastructure for nf_tables and cgroup2 support for iptables.
    More specifically, they are:

    1) Two patches to include dependencies in our netfilter userspace
    headers to resolve compilation problems, from Mikko Rapeli.

    2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

    3) Remove duplicate include in the netfilter reject infrastructure,
    from Stephen Hemminger.

    4) Two patches to simplify the netfilter defragmentation code for IPv6,
    patch from Florian Westphal.

    5) Fix root ownership of /proc/net netfilter for unpriviledged net
    namespaces, from Philip Whineray.

    6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

    7) Add mangling support to our nf_tables payload expression, from
    Patrick McHardy.

    8) Introduce a new netlink-based tracing infrastructure for nf_tables,
    from Florian Westphal.

    9) Change setter functions in nfnetlink_log to be void, from
    Rami Rosen.

    10) Add netns support to the cttimeout infrastructure.

    11) Add cgroup2 support to iptables, from Tejun Heo.

    12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

    13) Add support for mangling pkttype in the nf_tables meta expression,
    also from Florian.

    BTW, I need that you pull net into net-next, I have another batch that
    requires changes that I don't yet see in net.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

18 Dec, 2015

1 commit


15 Dec, 2015

1 commit


12 Dec, 2015

2 commits

  • If userspace executes ct(zone=1), and the connection tracker determines
    that the packet is invalid, then the ct_zone flow key field is populated
    with the default zone rather than the zone that was specified. Even
    though connection tracking failed, this field should be updated with the
    value that the action specified. Fix the issue.

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • If the actions (re)allocation fails, or the actions list is larger than the
    maximum size, and the conntrack action is the last action when these
    problems are hit, then references to helper modules may be leaked. Fix
    the issue.

    Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

04 Dec, 2015

3 commits

  • Conflicts:
    drivers/net/ethernet/renesas/ravb_main.c
    kernel/bpf/syscall.c
    net/ipv4/ipmr.c

    All three conflicts were cases of overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
    to the underlying tunnel device, but never released it when such
    device is deleted.
    Deleting the underlying device via the ip tool cause the kernel to
    hangup in the netdev_wait_allrefs() loop.
    This commit ensure that on device unregistration dp_detach_port_notify()
    is called for all vports that hold the device reference, properly
    releasing it.

    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
    Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
    Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
    Signed-off-by: Paolo Abeni
    Acked-by: Flavio Leitner
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • Sometimes the drivers and other code would find it handy to know some
    internal information about upper device being changed. So allow upper-code
    to pass information down to notifier listeners during linking.

    Signed-off-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Jiri Pirko