21 May, 2019

1 commit


27 Feb, 2019

1 commit

  • before:
    text data bss dec hex filename
    16566 1576 4136 22278 5706 nf_nat.ko
    3598 844 0 4442 115a nf_nat_ipv6.ko
    3187 844 0 4031 fbf nf_nat_ipv4.ko

    after:
    text data bss dec hex filename
    22948 1612 4136 28696 7018 nf_nat.ko

    ... with ipv4/v6 nat now provided directly via nf_nat.ko.

    Also changes:
    ret = nf_nat_ipv4_fn(priv, skb, state);
    if (ret != NF_DROP && ret != NF_STOLEN &&
    into
    if (ret != NF_ACCEPT)
    return ret;

    everywhere.

    The nat hooks never should return anything other than
    ACCEPT or DROP (and the latter only in rare error cases).

    The original code uses multi-line ANDing including assignment-in-if:
    if (ret != NF_DROP && ret != NF_STOLEN &&
    !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
    (ct = nf_ct_get(skb, &ctinfo)) != NULL) {

    I removed this while moving, breaking those in separate conditionals
    and moving the assignments into extra lines.

    checkpatch still generates some warnings:
    1. Overly long lines (of moved code).
    Breaking them is even more ugly. so I kept this as-is.
    2. use of extern function declarations in a .c file.
    This is necessary evil, we must call
    nf_nat_l3proto_register() from the nat core now.
    All l3proto related functions are removed later in this series,
    those prototypes are then removed as well.

    v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
    v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
    are removed here.
    v4: also get rid of the assignments in conditionals.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

26 May, 2018

1 commit

  • Currently, nf_conntrack_max is used to limit the maximum number of
    conntrack entries in the conntrack table for every network namespace.
    For the VMs and containers that reside in the same namespace,
    they share the same conntrack table, and the total # of conntrack entries
    for all the VMs and containers are limited by nf_conntrack_max. In this
    case, if one of the VM/container abuses the usage the conntrack entries,
    it blocks the others from committing valid conntrack entries into the
    conntrack table. Even if we can possibly put the VM in different network
    namespace, the current nf_conntrack_max configuration is kind of rigid
    that we cannot limit different VM/container to have different # conntrack
    entries.

    To address the aforementioned issue, this patch proposes to have a
    fine-grained mechanism that could further limit the # of conntrack entries
    per-zone. For example, we can designate different zone to different VM,
    and set conntrack limit to each zone. By providing this isolation, a
    mis-behaved VM only consumes the conntrack entries in its own zone, and
    it will not influence other well-behaved VMs. Moreover, the users can
    set various conntrack limit to different zone based on their preference.

    The proposed implementation utilizes Netfilter's nf_conncount backend
    to count the number of connections in a particular zone. If the number of
    connection is above a configured limitation, ovs will return ENOMEM to the
    userspace. If userspace does not configure the zone limit, the limit
    defaults to zero that is no limitation, which is backward compatible to
    the behavior without this patch.

    The following high leve APIs are provided to the userspace:
    - OVS_CT_LIMIT_CMD_SET:
    * set default connection limit for all zones
    * set the connection limit for a particular zone
    - OVS_CT_LIMIT_CMD_DEL:
    * remove the connection limit for a particular zone
    - OVS_CT_LIMIT_CMD_GET:
    * get the default connection limit for all zones
    * get the connection limit for a particular zone

    Signed-off-by: Yi-Hung Wei
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     

08 Nov, 2017

1 commit

  • v16->17
    - Fixed disputed check code: keep them in nsh_push and nsh_pop
    but also add them in __ovs_nla_copy_actions

    v15->v16
    - Add csum recalculation for nsh_push, nsh_pop and set_nsh
    pointed out by Pravin
    - Move nsh key into the union with ipv4 and ipv6 and add
    check for nsh key in match_validate pointed out by Pravin
    - Add nsh check in validate_set and __ovs_nla_copy_actions

    v14->v15
    - Check size in nsh_hdr_from_nlattr
    - Fixed four small issues pointed out By Jiri and Eric

    v13->v14
    - Rename skb_push_nsh to nsh_push per Dave's comment
    - Rename skb_pop_nsh to nsh_pop per Dave's comment

    v12->v13
    - Fix NSH header length check in set_nsh

    v11->v12
    - Fix missing changes old comments pointed out
    - Fix new comments for v11

    v10->v11
    - Fix the left three disputable comments for v9
    but not fixed in v10.

    v9->v10
    - Change struct ovs_key_nsh to
    struct ovs_nsh_key_base base;
    __be32 context[NSH_MD1_CONTEXT_SIZE];
    - Fix new comments for v9

    v8->v9
    - Fix build error reported by daily intel build
    because nsh module isn't selected by openvswitch

    v7->v8
    - Rework nested value and mask for OVS_KEY_ATTR_NSH
    - Change pop_nsh to adapt to nsh kernel module
    - Fix many issues per comments from Jiri Benc

    v6->v7
    - Remove NSH GSO patches in v6 because Jiri Benc
    reworked it as another patch series and they have
    been merged.
    - Change it to adapt to nsh kernel module added by NSH
    GSO patch series

    v5->v6
    - Fix the rest comments for v4.
    - Add NSH GSO support for VxLAN-gpe + NSH and
    Eth + NSH.

    v4->v5
    - Fix many comments by Jiri Benc and Eric Garver
    for v4.

    v3->v4
    - Add new NSH match field ttl
    - Update NSH header to the latest format
    which will be final format and won't change
    per its author's confirmation.
    - Fix comments for v3.

    v2->v3
    - Change OVS_KEY_ATTR_NSH to nested key to handle
    length-fixed attributes and length-variable
    attriubte more flexibly.
    - Remove struct ovs_action_push_nsh completely
    - Add code to handle nested attribute for SET_MASKED
    - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
    to transfer NSH header data.
    - Fix comments and coding style issues by Jiri and Eric

    v1->v2
    - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
    - Dynamically allocate struct ovs_action_push_nsh for
    length-variable metadata.

    OVS master and 2.8 branch has merged NSH userspace
    patch series, this patch is to enable NSH support
    in kernel data path in order that OVS can support
    NSH in compat mode by porting this.

    Signed-off-by: Yi Yang
    Acked-by: Jiri Benc
    Acked-by: Eric Garver
    Acked-by: Pravin Shelar
    Signed-off-by: David S. Miller

    Yi Yang
     

28 Mar, 2016

1 commit

  • The openvswitch code has gained support for calling into the
    nf-nat-ipv4/ipv6 modules, however those can be loadable modules
    in a configuration in which openvswitch is built-in, leading
    to link errors:

    net/built-in.o: In function `__ovs_ct_lookup':
    :(.text+0x2cc2c8): undefined reference to `nf_nat_icmp_reply_translation'
    :(.text+0x2cc66c): undefined reference to `nf_nat_icmpv6_reply_translation'

    The dependency on (!NF_NAT || NF_NAT) prevents similar issues,
    but NF_NAT is set to 'y' if any of the symbols selecting
    it are built-in, but the link error happens when any of them
    are modular.

    A second issue is that even if CONFIG_NF_NAT_IPV6 is built-in,
    CONFIG_NF_NAT_IPV4 might be completely disabled. This is unlikely
    to be useful in practice, but the driver currently only handles
    IPv6 being optional.

    This patch improves the Kconfig dependency so that openvswitch
    cannot be built-in if either of the two other symbols are set
    to 'm', and it replaces the incorrect #ifdef in ovs_ct_nat_execute()
    with two "if (IS_ENABLED())" checks that should catch all corner
    cases also make the code more readable.

    The same #ifdef exists ovs_ct_nat_to_attr(), where it does not
    cause a link error, but for consistency I'm changing it the same
    way.

    Signed-off-by: Arnd Bergmann
    Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Arnd Bergmann
     

15 Mar, 2016

1 commit

  • Extend OVS conntrack interface to cover NAT. New nested
    OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
    A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
    If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
    attributes, new (non-committed/non-confirmed) connections are mangled
    according to the rest of the nested attributes.

    The corresponding OVS userspace patch series includes test cases (in
    tests/system-traffic.at) that also serve as example uses.

    This work extends on a branch by Thomas Graf at
    https://github.com/tgraf/ovs/tree/nat.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Thomas Graf
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     

17 Feb, 2016

1 commit

  • In case of UDP traffic with datagram length
    below MTU this give about 2% performance increase
    when tunneling over ipv4 and about 60% when tunneling
    over ipv6

    Signed-off-by: Paolo Abeni
    Suggested-and-acked-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Paolo Abeni
     

12 Sep, 2015

1 commit

  • When NF_CONNTRACK is built-in, NF_DEFRAG_IPV6 is a module, and
    OPENVSWITCH is built-in, the following build error would occur:

    net/built-in.o: In function `ovs_ct_execute':
    (.text+0x10f587): undefined reference to `nf_ct_frag6_gather'

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Reported-by: Jim Davis
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

07 Sep, 2015

1 commit

  • There's no particular desire to have conntrack action support in Open
    vSwitch as an independently configurable bit, rather just to ensure
    there is not a hard dependency. This exposed option doesn't accurately
    reflect the conntrack dependency when enabled, so simplify this by
    removing the option. Compile the support if NF_CONNTRACK is enabled.

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

28 Aug, 2015

2 commits

  • With help of tunnel metadata mode OVS can directly use
    Geneve devices to implement Geneve tunnels.
    This patch removes all of the OVS specific Geneve code
    and make OVS use a Geneve net_device. Basic geneve vport
    is still there to handle compatibility with current
    userspace application.

    Signed-off-by: Pravin B Shelar
    Reviewed-by: Jesse Gross
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • Expose the kernel connection tracker via OVS. Userspace components can
    make use of the CT action to populate the connection state (ct_state)
    field for a flow. This state can be subsequently matched.

    Exposed connection states are OVS_CS_F_*:
    - NEW (0x01) - Beginning of a new connection.
    - ESTABLISHED (0x02) - Part of an existing connection.
    - RELATED (0x04) - Related to an established connection.
    - INVALID (0x20) - Could not track the connection for this packet.
    - REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
    - TRACKED (0x80) - This packet has been sent through conntrack.

    When the CT action is executed by itself, it will send the packet
    through the connection tracker and populate the ct_state field with one
    or more of the connection state flags above. The CT action will always
    set the TRACKED bit.

    When the COMMIT flag is passed to the conntrack action, this specifies
    that information about the connection should be stored. This allows
    subsequent packets for the same (or related) connections to be
    correlated with this connection. Sending subsequent packets for the
    connection through conntrack allows the connection tracker to consider
    the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

    The CT action may optionally take a zone to track the flow within. This
    allows connections with the same 5-tuple to be kept logically separate
    from connections in other zones. If the zone is specified, then the
    "ct_zone" match field will be subsequently populated with the zone id.

    IP fragments are handled by transparently assembling them as part of the
    CT action. The maximum received unit (MRU) size is tracked so that
    refragmentation can occur during output.

    IP frag handling contributed by Andy Zhou.

    Based on original design by Justin Pettit.

    Signed-off-by: Joe Stringer
    Signed-off-by: Justin Pettit
    Signed-off-by: Andy Zhou
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

11 Aug, 2015

1 commit

  • Using GRE tunnel meta data collection feature, we can implement
    OVS GRE vport. This patch removes all of the OVS
    specific GRE code and make OVS use a ip_gre net_device.
    Minimal GRE vport is kept to handle compatibility with
    current userspace application.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

30 Jul, 2015

1 commit

  • This readds the config option CONFIG_OPENVSWITCH_VXLAN to avoid a
    hard dependency of OVS on VXLAN. It moves the VXLAN config compat
    code to vport-vxlan.c and allows compliation as a module.

    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
    Fixes: 2661371ace96 ("openvswitch: fix compilation when vxlan is a module")
    Cc: Pravin B Shelar
    Cc: Nicolas Dichtel
    Signed-off-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     

27 Jul, 2015

1 commit

  • With CONFIG_VXLAN=m and CONFIG_OPENVSWITCH=y, there was the following
    compilation error:
    LD init/built-in.o
    net/built-in.o: In function `vxlan_tnl_create':
    .../net/openvswitch/vport-netdev.c:322: undefined reference to `vxlan_dev_create'
    make: *** [vmlinux] Error 1

    CC: Thomas Graf
    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
    Signed-off-by: Nicolas Dichtel
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

22 Jul, 2015

1 commit

  • This gets rid of all OVS specific VXLAN code in the receive and
    transmit path by using a VXLAN net_device to represent the vport.
    Only a small shim layer remains which takes care of handling the
    VXLAN specific OVS Netlink configuration.

    Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
    since they are no longer needed.

    Signed-off-by: Thomas Graf
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     

14 May, 2015

1 commit


09 Mar, 2015

1 commit

  • Fix the OPENVSWITCH Kconfig option and old Kconfigs by having
    OPENVSWITCH select both NET_MPLS_GSO and MPLSO.

    A Kbuild test robot reported that when NET_MPLS_GSO is selected by
    OPENVSWITCH the generated .config is broken because MPLS is not
    selected.

    Cc: Simon Horman
    Fixes: cec9166ca4e mpls: Refactor how the mpls module is built
    Reported-by: kbuild test robot
    Signed-off-by: "Eric W. Biederman"
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

14 Nov, 2014

1 commit

  • Add dependency on INET to fix following build error. I have also
    fixed MPLS dependency.

    ERROR: "ip_route_output_flow" [net/openvswitch/openvswitch.ko]
    undefined!
    make[1]: *** [__modpost] Error 1

    Reported-by: Jim Davis
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

06 Nov, 2014

1 commit

  • Allow datapath to recognize and extract MPLS labels into flow keys
    and execute actions which push, pop, and set labels on packets.

    Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

    Cc: Ravi K
    Cc: Leo Alterman
    Cc: Isaku Yamahata
    Cc: Joe Stringer
    Signed-off-by: Simon Horman
    Signed-off-by: Jesse Gross
    Signed-off-by: Pravin B Shelar

    Simon Horman
     

29 Oct, 2014

1 commit

  • The internal and netdev vport remain part of openvswitch.ko. Encap
    vports including vxlan, gre, and geneve can be built as separate
    modules and are loaded on demand. Modules can be unloaded after use.
    Datapath ports keep a reference to the vport module during their
    lifetime.

    Allows to remove the error prone maintenance of the global list
    vport_ops_list.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     

06 Oct, 2014

1 commit

  • The Openvswitch implementation is completely agnostic to the options
    that are in use and can handle newly defined options without
    further work. It does this by simply matching on a byte array
    of options and allowing userspace to setup flows on this array.

    Signed-off-by: Jesse Gross
    Singed-off-by: Ansis Atteka
    Signed-off-by: Andy Zhou
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jesse Gross
     

27 Aug, 2013

1 commit

  • This patch adds support for rewriting SCTP src,dst ports similar to the
    functionality already available for TCP/UDP.

    Rewriting SCTP ports is expensive due to double-recalculation of the
    SCTP checksums; this is performed to ensure that packets traversing OVS
    with invalid checksums will continue to the destination with any
    checksum corruption intact.

    Reviewed-by: Simon Horman
    Signed-off-by: Joe Stringer
    Signed-off-by: Ben Pfaff
    Signed-off-by: Jesse Gross

    Joe Stringer
     

20 Aug, 2013

1 commit


02 Jul, 2013

1 commit

  • Openvswitch uses function from NET_IPGRE_DEMUX module.
    Add Kconfig dependency to fix following compilation errors:
    http://marc.info/?l=linux-netdev&m=137244035226634

    CC: Jesse Gross
    Reported-by: Randy Dunlap
    Signed-off-by: Pravin Shelar
    Acked-by: Randy Dunlap
    Acked-by: Jesse Gross
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

20 Jun, 2013

1 commit

  • Add gre vport implementation. Most of gre protocol processing
    is pushed to gre module. It make use of gre demultiplexer
    therefore it can co-exist with linux device based gre tunnels.

    Signed-off-by: Pravin B Shelar
    Acked-by: Jesse Gross
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

04 Dec, 2011

1 commit

  • Open vSwitch is a multilayer Ethernet switch targeted at virtualized
    environments. In addition to supporting a variety of features
    expected in a traditional hardware switch, it enables fine-grained
    programmatic extension and flow-based control of the network.
    This control is useful in a wide variety of applications but is
    particularly important in multi-server virtualization deployments,
    which are often characterized by highly dynamic endpoints and the need
    to maintain logical abstractions for multiple tenants.

    The Open vSwitch datapath provides an in-kernel fast path for packet
    forwarding. It is complemented by a userspace daemon, ovs-vswitchd,
    which is able to accept configuration from a variety of sources and
    translate it into packet processing rules.

    See http://openvswitch.org for more information and userspace
    utilities.

    Signed-off-by: Jesse Gross

    Jesse Gross