05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of version 2 of the gnu general public license as
    published by the free software foundation this program is
    distributed in the hope that it will be useful but without any
    warranty without even the implied warranty of merchantability or
    fitness for a particular purpose see the gnu general public license
    for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 64 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexios Zavras
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

26 May, 2018

1 commit

  • Currently, nf_conntrack_max is used to limit the maximum number of
    conntrack entries in the conntrack table for every network namespace.
    For the VMs and containers that reside in the same namespace,
    they share the same conntrack table, and the total # of conntrack entries
    for all the VMs and containers are limited by nf_conntrack_max. In this
    case, if one of the VM/container abuses the usage the conntrack entries,
    it blocks the others from committing valid conntrack entries into the
    conntrack table. Even if we can possibly put the VM in different network
    namespace, the current nf_conntrack_max configuration is kind of rigid
    that we cannot limit different VM/container to have different # conntrack
    entries.

    To address the aforementioned issue, this patch proposes to have a
    fine-grained mechanism that could further limit the # of conntrack entries
    per-zone. For example, we can designate different zone to different VM,
    and set conntrack limit to each zone. By providing this isolation, a
    mis-behaved VM only consumes the conntrack entries in its own zone, and
    it will not influence other well-behaved VMs. Moreover, the users can
    set various conntrack limit to different zone based on their preference.

    The proposed implementation utilizes Netfilter's nf_conncount backend
    to count the number of connections in a particular zone. If the number of
    connection is above a configured limitation, ovs will return ENOMEM to the
    userspace. If userspace does not configure the zone limit, the limit
    defaults to zero that is no limitation, which is backward compatible to
    the behavior without this patch.

    The following high leve APIs are provided to the userspace:
    - OVS_CT_LIMIT_CMD_SET:
    * set default connection limit for all zones
    * set the connection limit for a particular zone
    - OVS_CT_LIMIT_CMD_DEL:
    * remove the connection limit for a particular zone
    - OVS_CT_LIMIT_CMD_GET:
    * get the default connection limit for all zones
    * get the connection limit for a particular zone

    Signed-off-by: Yi-Hung Wei
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     

11 Oct, 2017

1 commit

  • This adds a ct_clear action for clearing conntrack state. ct_clear is
    currently implemented in OVS userspace, but is not backed by an action
    in the kernel datapath. This is useful for flows that may modify a
    packet tuple after a ct lookup has already occurred.

    Signed-off-by: Eric Garver
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Eric Garver
     

10 Feb, 2017

2 commits

  • struct sw_flow_key has two 16-bit holes. Move the most matched
    conntrack match fields there. In some typical cases this reduces the
    size of the key that needs to be hashed into half and into one cache
    line.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     
  • Add the fields of the conntrack original direction 5-tuple to struct
    sw_flow_key. The new fields are initially marked as non-existent, and
    are populated whenever a conntrack action is executed and either finds
    or generates a conntrack entry. This means that these fields exist
    for all packets that were not rejected by conntrack as untrackable.

    The original tuple fields in the sw_flow_key are filled from the
    original direction tuple of the conntrack entry relating to the
    current packet, or from the original direction tuple of the master
    conntrack entry, if the current conntrack entry has a master.
    Generally, expected connections of connections having an assigned
    helper (e.g., FTP), have a master conntrack entry.

    The main purpose of the new conntrack original tuple fields is to
    allow matching on them for policy decision purposes, with the premise
    that the admissibility of tracked connections reply packets (as well
    as original direction packets), and both direction packets of any
    related connections may be based on ACL rules applying to the master
    connection's original direction 5-tuple. This also makes it easier to
    make policy decisions when the actual packet headers might have been
    transformed by NAT, as the original direction 5-tuple represents the
    packet headers before any such transformation.

    When using the original direction 5-tuple the admissibility of return
    and/or related packets need not be based on the mere existence of a
    conntrack entry, allowing separation of admission policy from the
    established conntrack state. While existence of a conntrack entry is
    required for admission of the return or related packets, policy
    changes can render connections that were initially admitted to be
    rejected or dropped afterwards. If the admission of the return and
    related packets was based on mere conntrack state (e.g., connection
    being in an established state), a policy change that would make the
    connection rejected or dropped would need to find and delete all
    conntrack entries affected by such a change. When using the original
    direction 5-tuple matching the affected conntrack entries can be
    allowed to time out instead, as the established state of the
    connection would not need to be the basis for packet admission any
    more.

    It should be noted that the directionality of related connections may
    be the same or different than that of the master connection, and
    neither the original direction 5-tuple nor the conntrack state bits
    carry this information. If needed, the directionality of the master
    connection can be stored in master's conntrack mark or labels, which
    are automatically inherited by the expected related connections.

    The fact that neither ARP nor ND packets are trackable by conntrack
    allows mutual exclusion between ARP/ND and the new conntrack original
    tuple fields. Hence, the IP addresses are overlaid in union with ARP
    and ND fields. This allows the sw_flow_key to not grow much due to
    this patch, but it also means that we must be careful to never use the
    new key fields with ARP or ND packets. ARP is easy to distinguish and
    keep mutually exclusive based on the ethernet type, but ND being an
    ICMPv6 protocol requires a bit more attention.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     

15 Mar, 2016

1 commit

  • Extend OVS conntrack interface to cover NAT. New nested
    OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
    A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
    If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
    attributes, new (non-committed/non-confirmed) connections are mangled
    according to the rest of the nested attributes.

    The corresponding OVS userspace patch series includes test cases (in
    tests/system-traffic.at) that also serve as example uses.

    This work extends on a branch by Thomas Graf at
    https://github.com/tgraf/ovs/tree/nat.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Thomas Graf
    Acked-by: Joe Stringer
    Signed-off-by: Pablo Neira Ayuso

    Jarno Rajahalme
     

28 Oct, 2015

1 commit

  • If ip_defrag() returns an error other than -EINPROGRESS, then the skb is
    freed. When handle_fragments() passes this back up to
    do_execute_actions(), it will be freed again. Prevent this double free
    by never freeing the skb in do_execute_actions() for errors returned by
    ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead.

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Reported-by: Florian Westphal
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

22 Oct, 2015

1 commit

  • Currently, 0-bits are generated in ct_state where the bit position is
    undefined, and matches are accepted on these bit-positions. If userspace
    requests to match the 0-value for this bit then it may expect only a
    subset of traffic to match this value, whereas currently all packets
    will have this bit set to 0. Fix this by rejecting such masks.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Joe Stringer
     

07 Oct, 2015

2 commits

  • The ct_state field was initially added as an 8-bit field, however six of
    the bits are already being used and use cases are already starting to
    appear that may push the limits of this field. This patch extends the
    field to 32 bits while retaining the internal representation of 8 bits.
    This should cover forward compatibility of the ABI for the foreseeable
    future.

    This patch also reorders the OVS_CS_F_* bits to be sequential.

    Suggested-by: Jarno Rajahalme
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Previously, if userspace specified ct_state bits in the flow key which
    are currently undefined (and therefore unsupported), then they would be
    ignored. This could cause unexpected behaviour in future if userspace is
    extended to support additional bits but attempts to communicate with the
    current version of the kernel. This patch rectifies the situation by
    rejecting such ct_state bits.

    Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action"
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

05 Oct, 2015

1 commit

  • Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name
    for these to be consistent with conntrack.

    Fixes: c2ac667 "openvswitch: Allow matching on conntrack label"
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

07 Sep, 2015

1 commit

  • There's no particular desire to have conntrack action support in Open
    vSwitch as an independently configurable bit, rather just to ensure
    there is not a hard dependency. This exposed option doesn't accurately
    reflect the conntrack dependency when enabled, so simplify this by
    removing the option. Compile the support if NF_CONNTRACK is enabled.

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

28 Aug, 2015

3 commits

  • Allow matching and setting the ct_label field. As with ct_mark, this is
    populated by executing the CT action. The label field may be modified by
    specifying a label and mask nested under the CT action. It is stored as
    metadata attached to the connection. Label modification occurs after
    lookup, and will only persist when the conntrack entry is committed by
    providing the COMMIT flag to the CT action. Labels are currently fixed
    to 128 bits in size.

    Signed-off-by: Joe Stringer
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Allow matching and setting the ct_mark field. As with ct_state and
    ct_zone, these fields are populated when the CT action is executed. To
    write to this field, a value and mask can be specified as a nested
    attribute under the CT action. This data is stored with the conntrack
    entry, and is executed after the lookup occurs for the CT action. The
    conntrack entry itself must be committed using the COMMIT flag in the CT
    action flags for this change to persist.

    Signed-off-by: Justin Pettit
    Signed-off-by: Joe Stringer
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Expose the kernel connection tracker via OVS. Userspace components can
    make use of the CT action to populate the connection state (ct_state)
    field for a flow. This state can be subsequently matched.

    Exposed connection states are OVS_CS_F_*:
    - NEW (0x01) - Beginning of a new connection.
    - ESTABLISHED (0x02) - Part of an existing connection.
    - RELATED (0x04) - Related to an established connection.
    - INVALID (0x20) - Could not track the connection for this packet.
    - REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
    - TRACKED (0x80) - This packet has been sent through conntrack.

    When the CT action is executed by itself, it will send the packet
    through the connection tracker and populate the ct_state field with one
    or more of the connection state flags above. The CT action will always
    set the TRACKED bit.

    When the COMMIT flag is passed to the conntrack action, this specifies
    that information about the connection should be stored. This allows
    subsequent packets for the same (or related) connections to be
    correlated with this connection. Sending subsequent packets for the
    connection through conntrack allows the connection tracker to consider
    the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

    The CT action may optionally take a zone to track the flow within. This
    allows connections with the same 5-tuple to be kept logically separate
    from connections in other zones. If the zone is specified, then the
    "ct_zone" match field will be subsequently populated with the zone id.

    IP fragments are handled by transparently assembling them as part of the
    CT action. The maximum received unit (MRU) size is tracked so that
    refragmentation can occur during output.

    IP frag handling contributed by Andy Zhou.

    Based on original design by Justin Pettit.

    Signed-off-by: Joe Stringer
    Signed-off-by: Justin Pettit
    Signed-off-by: Andy Zhou
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer