21 Sep, 2016

2 commits


11 Sep, 2016

1 commit


23 Jun, 2016

1 commit

  • The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.")
    introduces packet truncation before sending to userspace upcall receiver.
    This patch passes up the skb->len before truncation so that the upcall
    receiver knows the original packet size. Potentially this will be used
    by sFlow, where OVS translates sFlow config header=N to a sample action,
    truncating packet to N byte in kernel datapath. Thus, only N bytes instead
    of full-packet size is copied from kernel to userspace, saving the
    kernel-to-userspace bandwidth.

    Signed-off-by: William Tu
    Cc: Pravin Shelar
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    William Tu
     

11 Jun, 2016

1 commit

  • The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
    truncate packets. A 'max_len' is added for setting up the maximum
    packet size, and a 'cutlen' field is to record the number of bytes
    to trim the packet when the packet is outputting to a port, or when
    the packet is sent to userspace.

    Signed-off-by: William Tu
    Cc: Pravin Shelar
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    William Tu
     

27 Apr, 2016

1 commit


26 Apr, 2016

1 commit


14 Mar, 2016

1 commit

  • When we want to change a flow using netlink, we have to identify it to
    be able to perform a lookup. Both the flow key and unique flow ID
    (ufid) are valid identifiers, but we always have to specify the flow
    key in the netlink message. When both attributes are there, the ufid
    is used. The flow key is used to validate the actions provided by
    the userland.

    This commit allows to use the ufid without having to provide the flow
    key, as it is already done in the netlink 'flow get' and 'flow del'
    path. The flow key remains mandatory when an action is provided.

    Signed-off-by: Samuel Gauthier
    Reviewed-by: Simon Horman
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Samuel Gauthier
     

02 Mar, 2016

1 commit

  • This patch implements bookkeeping support to compute the maximum
    headroom for all the devices in each datapath. When said value
    changes, the underlying devs are notified via the
    ndo_set_rx_headroom method.

    This also increases the internal vports xmit performance.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

19 Feb, 2016

2 commits


11 Feb, 2016

1 commit

  • Operations with the GENL_ADMIN_PERM flag fail permissions checks because
    this flag means we call netlink_capable, which uses the init user ns.

    Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
    which should be allowed inside a user namespace.

    The motivation for this is to be able to run openvswitch in unprivileged
    containers. I've tested this and it seems to work, but I really have no
    idea about the security consequences of this patch, so thoughts would be
    much appreciated.

    v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
    v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
    massive one

    Reported-by: James Page
    Signed-off-by: Tycho Andersen
    CC: Eric Biederman
    CC: Pravin Shelar
    CC: Justin Pettit
    CC: "David S. Miller"
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tycho Andersen
     

16 Jan, 2016

1 commit

  • Skb_gso_segment() uses skb control block during segmentation.
    This patch adds 32-bytes room for previous control block which
    will be copied into all resulting segments.

    This patch fixes kernel crash during fragmenting forwarded packets.
    Fragmentation requires valid IP CB in skb for clearing ip options.
    Also patch removes custom save/restore in ovs code, now it's redundant.

    Signed-off-by: Konstantin Khlebnikov
    Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

08 Nov, 2015

1 commit

  • Pull trivial updates from Jiri Kosina:
    "Trivial stuff from trivial tree that can be trivially summed up as:

    - treewide drop of spurious unlikely() before IS_ERR() from Viresh
    Kumar

    - cosmetic fixes (that don't really affect basic functionality of the
    driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

    - various comment / printk fixes and updates all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    bcache: Really show state of work pending bit
    hwmon: applesmc: fix comment typos
    Kconfig: remove comment about scsi_wait_scan module
    class_find_device: fix reference to argument "match"
    debugfs: document that debugfs_remove*() accepts NULL and error values
    net: Drop unlikely before IS_ERR(_OR_NULL)
    mm: Drop unlikely before IS_ERR(_OR_NULL)
    fs: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
    UBI: Update comments to reflect UBI_METAONLY flag
    pktcdvd: drop null test before destroy functions

    Linus Torvalds
     

24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Oct, 2015

1 commit

  • While transitioning to netdev based vport we broke OVS
    feature which allows user to retrieve tunnel packet egress
    information for lwtunnel devices. Following patch fixes it
    by introducing ndo operation to get the tunnel egress info.
    Same ndo operation can be used for lwtunnel devices and compat
    ovs-tnl-vport devices. So after adding such device operation
    we can remove similar operation from ovs-vport.

    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

29 Sep, 2015

1 commit


27 Sep, 2015

1 commit


25 Sep, 2015

1 commit

  • The genl_notify function has too many arguments for no real reason - all
    callers use genl_info to get them anyway. Just pass the genl_info down to
    genl_notify.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

23 Sep, 2015

1 commit

  • When support for megaflows was introduced, OVS needed to start
    installing flows with a mask applied to them. Since masking is an
    expensive operation, OVS also had an optimization that would only
    take the parts of the flow keys that were covered by a non-zero
    mask. The values stored in the remaining pieces should not matter
    because they are masked out.

    While this works fine for the purposes of matching (which must always
    look at the mask), serialization to netlink can be problematic. Since
    the flow and the mask are serialized separately, the uninitialized
    portions of the flow can be encoded with whatever values happen to be
    present.

    In terms of functionality, this has little effect since these fields
    will be masked out by definition. However, it leaks kernel memory to
    userspace, which is a potential security vulnerability. It is also
    possible that other code paths could look at the masked key and get
    uninitialized data, although this does not currently appear to be an
    issue in practice.

    This removes the mask optimization for flows that are being installed.
    This was always intended to be the case as the mask optimizations were
    really targetting per-packet flow operations.

    Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
    Signed-off-by: Jesse Gross
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jesse Gross
     

01 Sep, 2015

1 commit

  • Currently tun-info options pointer is used in few cases to
    pass options around. But tunnel options can be accessed using
    ip_tunnel_info_opts() API without using the pointer. Following
    patch removes the redundant pointer and consistently make use
    of API.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Reviewed-by: Jesse Gross
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

30 Aug, 2015

1 commit


28 Aug, 2015

3 commits

  • Allow matching and setting the ct_label field. As with ct_mark, this is
    populated by executing the CT action. The label field may be modified by
    specifying a label and mask nested under the CT action. It is stored as
    metadata attached to the connection. Label modification occurs after
    lookup, and will only persist when the conntrack entry is committed by
    providing the COMMIT flag to the CT action. Labels are currently fixed
    to 128 bits in size.

    Signed-off-by: Joe Stringer
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Expose the kernel connection tracker via OVS. Userspace components can
    make use of the CT action to populate the connection state (ct_state)
    field for a flow. This state can be subsequently matched.

    Exposed connection states are OVS_CS_F_*:
    - NEW (0x01) - Beginning of a new connection.
    - ESTABLISHED (0x02) - Part of an existing connection.
    - RELATED (0x04) - Related to an established connection.
    - INVALID (0x20) - Could not track the connection for this packet.
    - REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
    - TRACKED (0x80) - This packet has been sent through conntrack.

    When the CT action is executed by itself, it will send the packet
    through the connection tracker and populate the ct_state field with one
    or more of the connection state flags above. The CT action will always
    set the TRACKED bit.

    When the COMMIT flag is passed to the conntrack action, this specifies
    that information about the connection should be stored. This allows
    subsequent packets for the same (or related) connections to be
    correlated with this connection. Sending subsequent packets for the
    connection through conntrack allows the connection tracker to consider
    the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

    The CT action may optionally take a zone to track the flow within. This
    allows connections with the same 5-tuple to be kept logically separate
    from connections in other zones. If the zone is specified, then the
    "ct_zone" match field will be subsequently populated with the zone id.

    IP fragments are handled by transparently assembling them as part of the
    CT action. The maximum received unit (MRU) size is tracked so that
    refragmentation can occur during output.

    IP frag handling contributed by Andy Zhou.

    Based on original design by Justin Pettit.

    Signed-off-by: Joe Stringer
    Signed-off-by: Justin Pettit
    Signed-off-by: Andy Zhou
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Previously, we used the kernel-internal netlink actions length to
    calculate the size of messages to serialize back to userspace.
    However,the sw_flow_actions may not be formatted exactly the same as the
    actions on the wire, so store the original actions length when
    de-serializing and re-use the original length when serializing.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Joe Stringer
     

22 Jul, 2015

3 commits


02 Jun, 2015

1 commit

  • If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
    OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
    in the upcall.

    This Directly associates the sampled packet with the path it takes
    through the virtual switch. Path information currently includes mangling,
    encapsulation and decapsulation actions for tunneling protocols GRE,
    VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
    changes to accommodate datapath actions that may be added in the
    future.

    Adding path information enhances visibility into complex virtual
    networks.

    Signed-off-by: Neil McKee
    Signed-off-by: David S. Miller

    Neil McKee
     

06 May, 2015

1 commit


13 Mar, 2015

1 commit

  • hold_net and release_net were an idea that turned out to be useless.
    The code has been disabled since 2008. Kill the code it is long past due.

    Signed-off-by: "Eric W. Biederman"
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

21 Feb, 2015

1 commit

  • Open vSwitch allows moving internal vport to different namespace
    while still connected to the bridge. But when namespace deleted
    OVS does not detach these vports, that results in dangling
    pointer to netdevice which causes kernel panic as follows.
    This issue is fixed by detaching all ovs ports from the deleted
    namespace at net-exit.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: [] ovs_vport_locate+0x35/0x80 [openvswitch]
    Oops: 0000 [#1] SMP
    Call Trace:
    [] lookup_vport+0x21/0xd0 [openvswitch]
    [] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
    [] genl_family_rcv_msg+0x1bc/0x3e0
    [] genl_rcv_msg+0x79/0xc0
    [] netlink_rcv_skb+0xb9/0xe0
    [] genl_rcv+0x2c/0x40
    [] netlink_unicast+0x12d/0x1c0
    [] netlink_sendmsg+0x34a/0x6b0
    [] sock_sendmsg+0xa0/0xe0
    [] ___sys_sendmsg+0x408/0x420
    [] __sys_sendmsg+0x51/0x90
    [] SyS_sendmsg+0x12/0x20
    [] system_call_fastpath+0x12/0x17

    Reported-by: Assaf Muller
    Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
    Signed-off-by: Pravin B Shelar
    Reviewed-by: Thomas Graf
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

27 Jan, 2015

2 commits

  • Previously, flows were manipulated by userspace specifying a full,
    unmasked flow key. This adds significant burden onto flow
    serialization/deserialization, particularly when dumping flows.

    This patch adds an alternative way to refer to flows using a
    variable-length "unique flow identifier" (UFID). At flow setup time,
    userspace may specify a UFID for a flow, which is stored with the flow
    and inserted into a separate table for lookup, in addition to the
    standard flow table. Flows created using a UFID must be fetched or
    deleted using the UFID.

    All flow dump operations may now be made more terse with OVS_UFID_F_*
    flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
    omit the flow key from a datapath operation if the flow has a
    corresponding UFID. This significantly reduces the time spent assembling
    and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
    enabled, the datapath only returns the UFID and statistics for each flow
    during flow dump, increasing ovs-vswitchd revalidator performance by 40%
    or more.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     
  • Refactor the ovs_nla_fill_match() function into separate netlink
    serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
    ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
    parameter - all callers need to nest the flow, and callers have better
    knowledge about whether it is serializing a mask or not.

    Signed-off-by: Joe Stringer
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer
     

18 Jan, 2015

1 commit

  • Contrary to common expectations for an "int" return, these functions
    return only a positive value -- if used correctly they cannot even
    return 0 because the message header will necessarily be in the skb.

    This makes the very common pattern of

    if (genlmsg_end(...) < 0) { ... }

    be a whole bunch of dead code. Many places also simply do

    return nlmsg_end(...);

    and the caller is expected to deal with it.

    This also commonly (at least for me) causes errors, because it is very
    common to write

    if (my_function(...))
    /* error condition */

    and if my_function() does "return nlmsg_end()" this is of course wrong.

    Additionally, there's not a single place in the kernel that actually
    needs the message length returned, and if anyone needs it later then
    it'll be very easy to just use skb->len there.

    Remove this, and make the functions void. This removes a bunch of dead
    code as described above. The patch adds lines because I did

    - return nlmsg_end(...);
    + nlmsg_end(...);
    + return 0;

    I could have preserved all the function's return values by returning
    skb->len, but instead I've audited all the places calling the affected
    functions and found that none cared. A few places actually compared
    the return value with < 0 with no change in behaviour, so I opted for the more
    efficient version.

    One instance of the error I've made numerous times now is also present
    in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
    check for
    Signed-off-by: David S. Miller

    Johannes Berg
     

15 Jan, 2015

2 commits

  • Conflicts:
    drivers/net/xen-netfront.c

    Minor overlapping changes in xen-netfront.c, mostly to do
    with some buffer management changes alongside the split
    of stats into TX and RX.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
    and packet messages. This leads to an out-of-bounds access in
    ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
    OVS_PACKET_ATTR_MAX.

    Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
    as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
    while maintaining to be binary compatible with existing OVS binaries.

    Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
    Reported-by: Sander Eikelenboom
    Tracked-down-by: Florian Westphal
    Signed-off-by: Thomas Graf
    Reviewed-by: Jesse Gross
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Thomas Graf
     

14 Jan, 2015

1 commit


27 Dec, 2014

1 commit

  • There's no point to force the caller to know about the internal
    genl_sock to use inside struct net, just have them pass the network
    namespace. This doesn't really change code generation since it's
    an inline, but makes the caller less magic - there's never any
    reason to pass another socket.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

22 Nov, 2014

1 commit

  • Conflicts:
    drivers/net/ieee802154/fakehard.c

    A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
    in 'net-next'.

    Add build fix into the merge from Stephen Rothwell in openvswitch, the
    logging macros take a new initial 'log' argument, a new call was added
    in 'net' so when we merge that in here we have to explicitly add the
    new 'log' arg to it else the build fails.

    Signed-off-by: David S. Miller

    David S. Miller