27 Oct, 2015

1 commit

  • [ Upstream commit 598c12d0ba6de9060f04999746eb1e015774044b ]

    When openvswitch tries allocate memory from offline numa node 0:
    stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
    It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
    [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
    This patch disables numa affinity in this case.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     

03 Oct, 2015

1 commit

  • [ Upstream commit ae5f2fb1d51fa128a460bcfbe3c56d7ab8bf6a43 ]

    When support for megaflows was introduced, OVS needed to start
    installing flows with a mask applied to them. Since masking is an
    expensive operation, OVS also had an optimization that would only
    take the parts of the flow keys that were covered by a non-zero
    mask. The values stored in the remaining pieces should not matter
    because they are masked out.

    While this works fine for the purposes of matching (which must always
    look at the mask), serialization to netlink can be problematic. Since
    the flow and the mask are serialized separately, the uninitialized
    portions of the flow can be encoded with whatever values happen to be
    present.

    In terms of functionality, this has little effect since these fields
    will be masked out by definition. However, it leaks kernel memory to
    userspace, which is a potential security vulnerability. It is also
    possible that other code paths could look at the masked key and get
    uninitialized data, although this does not currently appear to be an
    issue in practice.

    This removes the mask optimization for flows that are being installed.
    This was always intended to be the case as the mask optimizations were
    really targetting per-packet flow operations.

    Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
    Signed-off-by: Jesse Gross
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Jesse Gross
     

04 Jun, 2015

1 commit

  • Currently, openvswitch tries to disable LRO from the user space. This does
    not work correctly when the device added is a vlan interface, though.
    Instead of dealing with possibly complex stacked cross name space relations
    in the user space, do the same as bridging does and call dev_disable_lro in
    the kernel.

    Signed-off-by: Jiri Benc
    Acked-by: Flavio Leitner
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

16 Apr, 2015

1 commit

  • Pull networking updates from David Miller:

    1) Add BQL support to via-rhine, from Tino Reichardt.

    2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
    can support hw switch offloading. From Floria Fainelli.

    3) Allow 'ip address' commands to initiate multicast group join/leave,
    from Madhu Challa.

    4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

    5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
    Borkmann.

    6) Remove the ugly compat support in ARP for ugly layers like ax25,
    rose, etc. And use this to clean up the neigh layer, then use it to
    implement MPLS support. All from Eric Biederman.

    7) Support L3 forwarding offloading in switches, from Scott Feldman.

    8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
    up route lookups even further. From Alexander Duyck.

    9) Many improvements and bug fixes to the rhashtable implementation,
    from Herbert Xu and Thomas Graf. In particular, in the case where
    an rhashtable user bulk adds a large number of items into an empty
    table, we expand the table much more sanely.

    10) Don't make the tcp_metrics hash table per-namespace, from Eric
    Biederman.

    11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

    12) Split out new connection request sockets so that they can be
    established in the main hash table. Much less false sharing since
    hash lookups go direct to the request sockets instead of having to
    go first to the listener then to the request socks hashed
    underneath. From Eric Dumazet.

    13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

    14) Support stable privacy address generation for RFC7217 in IPV6. From
    Hannes Frederic Sowa.

    15) Hash network namespace into IP frag IDs, also from Hannes Frederic
    Sowa.

    16) Convert PTP get/set methods to use 64-bit time, from Richard
    Cochran.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
    fm10k: Bump driver version to 0.15.2
    fm10k: corrected VF multicast update
    fm10k: mbx_update_max_size does not drop all oversized messages
    fm10k: reset head instead of calling update_max_size
    fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
    fm10k: update xcast mode before synchronizing multicast addresses
    fm10k: start service timer on probe
    fm10k: fix function header comment
    fm10k: comment next_vf_mbx flow
    fm10k: don't handle mailbox events in iov_event path and always process mailbox
    fm10k: use separate workqueue for fm10k driver
    fm10k: Set PF queues to unlimited bandwidth during virtualization
    fm10k: expose tx_timeout_count as an ethtool stat
    fm10k: only increment tx_timeout_count in Tx hang path
    fm10k: remove extraneous "Reset interface" message
    fm10k: separate PF only stats so that VF does not display them
    fm10k: use hw->mac.max_queues for stats
    fm10k: only show actual queues, not the maximum in hardware
    fm10k: allow creation of VLAN on default vid
    fm10k: fix unused warnings
    ...

    Linus Torvalds
     

15 Apr, 2015

1 commit

  • NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.

    GFP_THISNODE is a secret combination of gfp bits that have different
    behavior than expected. It is a combination of __GFP_THISNODE,
    __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
    allocator slowpath to fail without trying reclaim even though it may be
    used in combination with __GFP_WAIT.

    An example of the problem this creates: commit e97ca8e5b864 ("mm: fix
    GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
    that really just wanted __GFP_THISNODE. The problem doesn't end there,
    however, because even it was a no-op for alloc_misplaced_dst_page(),
    which also sets __GFP_NORETRY and __GFP_NOWARN, and
    migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
    is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is a
    no-op in these cases since the page allocator special-cases
    __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.

    It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE
    to restrict an allocation to a local node, but remove GFP_THISNODE and
    its obscurity. Instead, we require that a caller clear __GFP_WAIT if it
    wants to avoid reclaim.

    This allows the aforementioned functions to actually reclaim as they
    should. It also enables any future callers that want to do
    __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The
    rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

    Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
    it is unchanged.

    Signed-off-by: David Rientjes
    Acked-by: Vlastimil Babka
    Cc: Christoph Lameter
    Acked-by: Pekka Enberg
    Cc: Joonsoo Kim
    Acked-by: Johannes Weiner
    Cc: Mel Gorman
    Cc: Pravin Shelar
    Cc: Jarno Rajahalme
    Cc: Li Zefan
    Cc: Greg Thelen
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

08 Apr, 2015

1 commit


03 Apr, 2015

1 commit

  • Conflicts:
    drivers/net/usb/asix_common.c
    drivers/net/usb/sr9800.c
    drivers/net/usb/usbnet.c
    include/linux/usb/usbnet.h
    net/ipv4/tcp_ipv4.c
    net/ipv6/tcp_ipv6.c

    The TCP conflicts were overlapping changes. In 'net' we added a
    READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
    Eric Dumazet touched the surrounding code dealing with how mini
    sockets are handled.

    With USB, it's a case of the same bug fix first going into net-next
    and then I cherry picked it back into net.

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Apr, 2015

3 commits

  • Return module reference before invoking the respective vport
    ->destroy() function. This is needed as ovs_vport_del() is not
    invoked inside an RCU read side critical section so the kfree
    can occur immediately before returning to ovs_vport_del().

    Returning the module reference before ->destroy() is safe because
    the module unregistration is blocked on ovs_lock which we hold
    while destroying the datapath.

    Fixes: 62b9c8d0372d ("ovs: Turn vports with dependencies into separate modules")
    Reported-by: Pravin Shelar
    Signed-off-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     
  • IP addresses are often stored in netlink attributes. Add generic functions
    to do that.

    For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
    not used universally throughout the kernel, in way too many places __be32 is
    used to store IPv4 address.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

13 Mar, 2015

2 commits

  • Having to say
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif

    in structures is a little bit wordy and a little bit error prone.

    Instead it is possible to say:
    > typedef struct {
    > #ifdef CONFIG_NET_NS
    > struct net *net;
    > #endif
    > } possible_net_t;

    And then in a header say:

    > possible_net_t net;

    Which is cleaner and easier to use and easier to test, as the
    possible_net_t is always there no matter what the compile options.

    Further this allows read_pnet and write_pnet to be functions in all
    cases which is better at catching typos.

    This change adds possible_net_t, updates the definitions of read_pnet
    and write_pnet, updates optional struct net * variables that
    write_pnet uses on to have the type possible_net_t, and finally fixes
    up the b0rked users of read_pnet and write_pnet.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • hold_net and release_net were an idea that turned out to be useless.
    The code has been disabled since 2008. Kill the code it is long past due.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

09 Mar, 2015

1 commit

  • Fix the OPENVSWITCH Kconfig option and old Kconfigs by having
    OPENVSWITCH select both NET_MPLS_GSO and MPLSO.

    A Kbuild test robot reported that when NET_MPLS_GSO is selected by
    OPENVSWITCH the generated .config is broken because MPLS is not
    selected.

    Cc: Simon Horman
    Fixes: cec9166ca4e mpls: Refactor how the mpls module is built
    Reported-by: kbuild test robot
    Signed-off-by: "Eric W. Biederman"
    Reviewed-by: Simon Horman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

04 Mar, 2015

1 commit

  • Set actions consist of a regular OVS_KEY_ATTR_* attribute nested inside
    of a OVS_ACTION_ATTR_SET action attribute. When converting masked actions
    back to regular set actions, the inner attribute length was not changed,
    ie, double the length being serialized. This patch fixes the bug.

    Fixes: 83d2b9b ("net: openvswitch: Support masked set actions.")
    Signed-off-by: Joe Stringer
    Acked-by: Jarno Rajahalme
    Signed-off-by: David S. Miller

    Joe Stringer
     

21 Feb, 2015

1 commit

  • Open vSwitch allows moving internal vport to different namespace
    while still connected to the bridge. But when namespace deleted
    OVS does not detach these vports, that results in dangling
    pointer to netdevice which causes kernel panic as follows.
    This issue is fixed by detaching all ovs ports from the deleted
    namespace at net-exit.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: [] ovs_vport_locate+0x35/0x80 [openvswitch]
    Oops: 0000 [#1] SMP
    Call Trace:
    [] lookup_vport+0x21/0xd0 [openvswitch]
    [] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
    [] genl_family_rcv_msg+0x1bc/0x3e0
    [] genl_rcv_msg+0x79/0xc0
    [] netlink_rcv_skb+0xb9/0xe0
    [] genl_rcv+0x2c/0x40
    [] netlink_unicast+0x12d/0x1c0
    [] netlink_sendmsg+0x34a/0x6b0
    [] sock_sendmsg+0xa0/0xe0
    [] ___sys_sendmsg+0x408/0x420
    [] __sys_sendmsg+0x51/0x90
    [] SyS_sendmsg+0x12/0x20
    [] system_call_fastpath+0x12/0x17

    Reported-by: Assaf Muller
    Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
    Signed-off-by: Pravin B Shelar
    Reviewed-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

15 Feb, 2015

1 commit


12 Feb, 2015

2 commits

  • net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’:
    net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function

    If ipv4_tun_from_nlattr() returns a different positive value than
    OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and
    validate_and_copy_set_tun() may return an undefined value instead of a
    zero success indicator. Initialize err to zero to fix this.

    Fixes: 1dd144cf5b4b47e1 ("openvswitch: Support VXLAN Group Policy extension")
    Signed-off-by: Geert Uytterhoeven
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     
  • Userspace packet execute command pass down flow key for given
    packet. But userspace can skip some parameter with zero value.
    Therefore kernel needs to initialize key metadata to zero.

    Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

10 Feb, 2015

1 commit


08 Feb, 2015

2 commits

  • Flow alloc needs to initialize unmasked key pointer. Otherwise
    it can crash kernel trying to free random unmasked-key pointer.

    general protection fault: 0000 [#1] SMP
    3.19.0-rc6-net-next+ #457
    Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008
    RIP: 0010:[] [] kfree+0xac/0x196
    Call Trace:
    [] flow_free+0x21/0x59 [openvswitch]
    [] ovs_flow_free+0x21/0x23 [openvswitch]
    [] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
    [] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
    [] ? nla_parse+0x4f/0xec
    [] genl_family_rcv_msg+0x26d/0x2c9
    [] ? __lock_acquire+0x90e/0x9aa
    [] genl_rcv_msg+0x66/0x89
    [] ? genl_family_rcv_msg+0x2c9/0x2c9
    [] netlink_rcv_skb+0x3e/0x95
    [] ? genl_rcv+0x18/0x37
    [] genl_rcv+0x27/0x37
    [] netlink_unicast+0x103/0x191
    [] netlink_sendmsg+0x2c1/0x310
    [] ? might_fault+0x50/0xa0
    [] do_sock_sendmsg+0x5f/0x7a
    [] sock_sendmsg+0xb/0xd
    [] ___sys_sendmsg+0x1a3/0x218
    [] ? get_close_on_exec+0x86/0x86
    [] ? fsnotify+0x32c/0x348
    [] ? fsnotify+0x7c/0x348
    [] ? __fget+0xaa/0xbf
    [] ? get_close_on_exec+0x86/0x86
    [] __sys_sendmsg+0x3d/0x5e
    [] SyS_sendmsg+0x14/0x16
    [] system_call_fastpath+0x12/0x17

    Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
    CC: Joe Stringer
    Reported-by: Or Gerlitz
    Signed-off-by: Pravin B Shelar
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Pravin B Shelar
     
  • OVS userspace already probes the openvswitch kernel module for
    OVS_ACTION_ATTR_SET_MASKED support. This patch adds the kernel module
    implementation of masked set actions.

    The existing set action sets many fields at once. When only a subset
    of the IP header fields, for example, should be modified, all the IP
    fields need to be exact matched so that the other field values can be
    copied to the set action. A masked set action allows modification of
    an arbitrary subset of the supported header bits without requiring the
    rest to be matched.

    Masked set action is now supported for all writeable key types, except
    for the tunnel key. The set tunnel action is an exception as any
    input tunnel info is cleared before action processing starts, so there
    is no tunnel info to mask.

    The kernel module converts all (non-tunnel) set actions to masked set
    actions. This makes action processing more uniform, and results in
    less branching and duplicating the action processing code. When
    returning actions to userspace, the fully masked set actions are
    converted back to normal set actions. We use a kernel internal action
    code to be able to tell the userspace provided and converted masked
    set actions apart.

    Signed-off-by: Jarno Rajahalme
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jarno Rajahalme
     

29 Jan, 2015

1 commit

  • Currently, it isn't possible to request checksums on the outer UDP
    header of tunnels - the TUNNEL_CSUM flag is ignored. This adds
    support for requesting that UDP checksums be computed on transmit
    and properly reported if they are present on receive.

    Signed-off-by: Jesse Gross
    Signed-off-by: David S. Miller

    Jesse Gross
     

27 Jan, 2015

4 commits

  • Previously, flows were manipulated by userspace specifying a full,
    unmasked flow key. This adds significant burden onto flow
    serialization/deserialization, particularly when dumping flows.

    This patch adds an alternative way to refer to flows using a
    variable-length "unique flow identifier" (UFID). At flow setup time,
    userspace may specify a UFID for a flow, which is stored with the flow
    and inserted into a separate table for lookup, in addition to the
    standard flow table. Flows created using a UFID must be fetched or
    deleted using the UFID.

    All flow dump operations may now be made more terse with OVS_UFID_F_*
    flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
    omit the flow key from a datapath operation if the flow has a
    corresponding UFID. This significantly reduces the time spent assembling
    and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
    enabled, the datapath only returns the UFID and statistics for each flow
    during flow dump, increasing ovs-vswitchd revalidator performance by 40%
    or more.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • These minor tidyups make a future patch a little tidier.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
    This tidies up a future patch.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Refactor the ovs_nla_fill_match() function into separate netlink
    serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
    ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
    parameter - all callers need to nest the flow, and callers have better
    knowledge about whether it is serializing a mask or not.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

25 Jan, 2015

1 commit

  • In the vxlan transmit path there is no need to reference the socket
    for a tunnel which is needed for the receive side. We do, however,
    need the vxlan_dev flags. This patch eliminate references
    to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
    to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
    applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
    vxlan_sock flags.

    Signed-off-by: Tom Herbert
    Signed-off-by: David S. Miller

    Tom Herbert
     

18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

15 Jan, 2015

7 commits

  • Introduces support for the group policy extension to the VXLAN virtual
    port. The extension is disabled by default and only enabled if the user
    has provided the respective configuration.

    ovs-vsctl add-port br0 vxlan0 -- \
    set Interface vxlan0 type=vxlan options:exts=gbp

    The configuration interface to enable the extension is based on a new
    attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
    which can carry additional extensions as needed in the future.

    The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
    internally just like Geneve options but transported as nested Netlink
    attributes to user space.

    Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
    binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.

    The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
    OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • nlattr_set() is currently hardcoded to two levels of nesting. This change
    introduces struct ovs_len_tbl to define minimal length requirements plus
    next level nesting tables to traverse the key attributes to arbitrary depth.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Also factors out Geneve validation code into a new separate function
    validate_and_copy_geneve_opts().

    A subsequent patch will introduce VXLAN options. Rename the existing
    GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
    tunnel metadata options.

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Implements supports for the Group Policy VXLAN extension [0] to provide
    a lightweight and simple security label mechanism across network peers
    based on VXLAN. The security context and associated metadata is mapped
    to/from skb->mark. This allows further mapping to a SELinux context
    using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
    tc, etc.

    The group membership is defined by the lower 16 bits of skb->mark, the
    upper 16 bits are used for flags.

    SELinux allows to manage label to secure local resources. However,
    distributed applications require ACLs to implemented across hosts. This
    is typically achieved by matching on L2-L4 fields to identify the
    original sending host and process on the receiver. On top of that,
    netlabel and specifically CIPSO [1] allow to map security contexts to
    universal labels. However, netlabel and CIPSO are relatively complex.
    This patch provides a lightweight alternative for overlay network
    environments with a trusted underlay. No additional control protocol
    is required.

    Host 1: Host 2:

    Group A Group B Group B Group A
    +-----+ +-------------+ +-------+ +-----+
    | lxc | | SELinux CTX | | httpd | | VM |
    +--+--+ +--+----------+ +---+---+ +--+--+
    \---+---/ \----+---/
    | |
    +---+---+ +---+---+
    | vxlan | | vxlan |
    +---+---+ +---+---+
    +------------------------------+

    Backwards compatibility:
    A VXLAN-GBP socket can receive standard VXLAN frames and will assign
    the default group 0x0000 to such frames. A Linux VXLAN socket will
    drop VXLAN-GBP frames. The extension is therefore disabled by default
    and needs to be specifically enabled:

    ip link add [...] type vxlan [...] gbp

    In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
    must run on a separate port number.

    Examples:
    iptables:
    host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
    host2# iptables -I INPUT -m mark --mark 0x200 -j DROP

    OVS:
    # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
    # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'

    [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
    [1] http://lwn.net/Articles/204905/

    Signed-off-by: Thomas Graf
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Conflicts:
    drivers/net/xen-netfront.c

    Minor overlapping changes in xen-netfront.c, mostly to do
    with some buffer management changes alongside the split
    of stats into TX and RX.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
    and packet messages. This leads to an out-of-bounds access in
    ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
    OVS_PACKET_ATTR_MAX.

    Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
    as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
    while maintaining to be binary compatible with existing OVS binaries.

    Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
    Reported-by: Sander Eikelenboom
    Tracked-down-by: Florian Westphal
    Signed-off-by: Thomas Graf
    Reviewed-by: Jesse Gross
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • Introduce ovs_tunnel_route_lookup to consolidate route lookup
    shared by vxlan, gre, and geneve ports.

    Signed-off-by: Fan Du
    Signed-off-by: David S. Miller

    Fan Du
     

14 Jan, 2015

2 commits


03 Jan, 2015

1 commit

  • Until now, when VLAN acceleration was in use, the bytes of the VLAN header
    were not included in port or flow byte counters. They were however
    included when VLAN acceleration was not used. This commit corrects the
    inconsistency, by always including the VLAN header in byte counters.

    Previous discussion at
    http://openvswitch.org/pipermail/dev/2014-December/049521.html

    Reported-by: Motonori Shindo
    Signed-off-by: Ben Pfaff
    Reviewed-by: Flavio Leitner
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Ben Pfaff
     

27 Dec, 2014

1 commit

  • There's no point to force the caller to know about the internal
    genl_sock to use inside struct net, just have them pass the network
    namespace. This doesn't really change code generation since it's
    an inline, but makes the caller less magic - there's never any
    reason to pass another socket.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

25 Dec, 2014

1 commit

  • net/openvswitch/vport-gre.c:188:5-11: inconsistent IS_ERR and PTR_ERR, PTR_ERR on line 189

    PTR_ERR should access the value just tested by IS_ERR

    Semantic patch information:
    There can be false positives in the patch case, where it is the call
    IS_ERR that is wrong.

    Generated by: scripts/coccinelle/tests/odd_ptr_err.cocci

    CC: Pravin B Shelar
    Signed-off-by: Fengguang Wu
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Wu Fengguang