17 Dec, 2017

2 commits

  • [ Upstream commit 2734166e89639c973c6e125ac8bcfc2d9db72b70 ]

    gso_type is being used in binary AND operations together with SKB_GSO_UDP.
    The issue is that variable gso_type is of type unsigned short and
    SKB_GSO_UDP expands to more than 16 bits:

    SKB_GSO_UDP = 1 << 16

    this makes any binary AND operation between gso_type and SKB_GSO_UDP to
    be always zero, hence making some code unreachable and likely causing
    undesired behavior.

    Fix this by changing the data type of variable gso_type to unsigned int.

    Addresses-Coverity-ID: 1462223
    Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
    Signed-off-by: Gustavo A. R. Silva
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     
  • [ Upstream commit 0c19f846d582af919db66a5914a0189f9f92c936 ]

    Tuntap and similar devices can inject GSO packets. Accept type
    VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

    Processes are expected to use feature negotiation such as TUNSETOFFLOAD
    to detect supported offload types and refrain from injecting other
    packets. This process breaks down with live migration: guest kernels
    do not renegotiate flags, so destination hosts need to expose all
    features that the source host does.

    Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
    This patch introduces nearly(*) no new code to simplify verification.
    It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
    insertion and software UFO segmentation.

    It does not reinstate protocol stack support, hardware offload
    (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
    of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

    To support SKB_GSO_UDP reappearing in the stack, also reinstate
    logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
    by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
    CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
    ("net: avoid skb_warn_bad_offload false positives on UFO").

    (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
    ipv6_proxy_select_ident is changed to return a __be32 and this is
    assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
    at the end of the enum to minimize code churn.

    Tested
    Booted a v4.13 guest kernel with QEMU. On a host kernel before this
    patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
    enabled, same as on a v4.13 host kernel.

    A UFO packet sent from the guest appears on the tap device:
    host:
    nc -l -p -u 8000 &
    tcpdump -n -i tap0

    guest:
    dd if=/dev/zero of=payload.txt bs=1 count=2000
    nc -u 192.16.1.1 8000 < payload.txt

    Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
    packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

    Changes
    v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

    Link: http://lkml.kernel.org/r/
    Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
    Reported-by: Michal Kubecek
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     

13 Sep, 2017

1 commit


22 Aug, 2017

1 commit


17 Aug, 2017

1 commit

  • For sw_flow_actions, the actions_len only represents the kernel part's
    size, and when we dump the actions to the userspace, we will do the
    convertions, so it's true size may become bigger than the actions_len.

    But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
    to alloc the skbuff, so the user_skb's size may become insufficient and
    oops will happen like this:
    skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
    ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:129!
    [...]
    Call Trace:

    [] skb_put+0x43/0x44
    [] skb_zerocopy+0x6c/0x1f4
    [] queue_userspace_packet+0x3a3/0x448 [openvswitch]
    [] ovs_dp_upcall+0x30/0x5c [openvswitch]
    [] output_userspace+0x132/0x158 [openvswitch]
    [] ? ip6_rcv_finish+0x74/0x77 [ipv6]
    [] do_execute_actions+0xcc1/0xdc8 [openvswitch]
    [] ovs_execute_actions+0x74/0x106 [openvswitch]
    [] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
    [] ? key_extract+0x63c/0x8d5 [openvswitch]
    [] ovs_vport_receive+0xa1/0xc3 [openvswitch]
    [...]

    Also we can find that the actions_len is much little than the orig_len:
    crash> struct sw_flow_actions 0xffff8812f539d000
    struct sw_flow_actions {
    rcu = {
    next = 0xffff8812f5398800,
    func = 0xffffe3b00035db32
    },
    orig_len = 1384,
    actions_len = 592,
    actions = 0xffff8812f539d01c
    }

    So as a quick fix, use the orig_len instead of the actions_len to alloc
    the user_skb.

    Last, this oops happened on our system running a relative old kernel, but
    the same risk still exists on the mainline, since we use the wrong
    actions_len from the beginning.

    Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
    Cc: Neil McKee
    Signed-off-by: Liping Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Liping Zhang
     

18 Jul, 2017

1 commit


02 Jul, 2017

1 commit

  • When compiling OvS-master on 4.4.0-81 kernel,
    there is a warning:

    CC [M] /root/ovs/datapath/linux/datapath.o
    /root/ovs/datapath/linux/datapath.c: In function
    'ovs_flow_cmd_set':
    /root/ovs/datapath/linux/datapath.c:1221:1: warning:
    the frame size of 1040 bytes is larger than 1024 bytes
    [-Wframe-larger-than=]

    This patch factors out match-init and action-copy to avoid
    "Wframe-larger-than=1024" warning. Because mask is only
    used to get actions, we new a function to save some
    stack space.

    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

16 Jun, 2017

1 commit

  • There were many places that my previous spatch didn't find,
    as pointed out by yuan linyu in various patches.

    The following spatch found many more and also removes the
    now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

    Apply it to the tree (with one manual fixup to keep the
    comment in vxlan.c, which spatch removed.)

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

20 May, 2017

1 commit


14 Apr, 2017

1 commit


28 Dec, 2016

1 commit

  • Networking stack accelerate vlan tag handling by
    keeping topmost vlan header in skb. This works as
    long as packet remains in OVS datapath. But during
    OVS upcall vlan header is pushed on to the packet.
    When such packet is sent back to OVS datapath, core
    networking stack might not handle it correctly. Following
    patch avoids this issue by accelerating the vlan tag
    during flow key extract. This simplifies datapath by
    bringing uniform packet processing for packets from
    all code paths.

    Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
    CC: Jarno Rajahalme
    CC: Jiri Benc
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    pravin shelar
     

18 Nov, 2016

1 commit

  • Make struct pernet_operations::id unsigned.

    There are 2 reasons to do so:

    1)
    This field is really an index into an zero based array and
    thus is unsigned entity. Using negative value is out-of-bound
    access by definition.

    2)
    On x86_64 unsigned 32-bit data which are mixed with pointers
    via array indexing or offsets added or subtracted to pointers
    are preffered to signed 32-bit data.

    "int" being used as an array index needs to be sign-extended
    to 64-bit before being used.

    void f(long *p, int i)
    {
    g(p[i]);
    }

    roughly translates to

    movsx rsi, esi
    mov rdi, [rsi+...]
    call g

    MOVSX is 3 byte instruction which isn't necessary if the variable is
    unsigned because x86_64 is zero extending by default.

    Now, there is net_generic() function which, you guessed it right, uses
    "int" as an array index:

    static inline void *net_generic(const struct net *net, int id)
    {
    ...
    ptr = ng->ptr[id - 1];
    ...
    }

    And this function is used a lot, so those sign extensions add up.

    Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
    messing with code generation):

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

    Unfortunately some functions actually grow bigger.
    This is a semmingly random artefact of code generation with register
    allocator being used differently. gcc decides that some variable
    needs to live in new r8+ registers and every access now requires REX
    prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
    used which is longer than [r8]

    However, overall balance is in negative direction:

    add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
    function old new delta
    nfsd4_lock 3886 3959 +73
    tipc_link_build_proto_msg 1096 1140 +44
    mac80211_hwsim_new_radio 2776 2808 +32
    tipc_mon_rcv 1032 1058 +26
    svcauth_gss_legacy_init 1413 1429 +16
    tipc_bcbase_select_primary 379 392 +13
    nfsd4_exchange_id 1247 1260 +13
    nfsd4_setclientid_confirm 782 793 +11
    ...
    put_client_renew_locked 494 480 -14
    ip_set_sockfn_get 730 716 -14
    geneve_sock_add 829 813 -16
    nfsd4_sequence_done 721 703 -18
    nlmclnt_lookup_host 708 686 -22
    nfsd4_lockt 1085 1063 -22
    nfs_get_client 1077 1050 -27
    tcf_bpf_init 1106 1076 -30
    nfsd4_encode_fattr 5997 5930 -67
    Total: Before=154856051, After=154854321, chg -0.00%

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

13 Nov, 2016

1 commit

  • Support receiving, extracting flow key and sending of L3 packets (packets
    without an Ethernet header).

    Note that even after this patch, non-Ethernet interfaces are still not
    allowed to be added to bridges. Similarly, netlink interface for sending and
    receiving L3 packets to/from user space is not in place yet.

    Based on previous versions by Lorand Jakab and Simon Horman.

    Signed-off-by: Lorand Jakab
    Signed-off-by: Simon Horman
    Signed-off-by: Jiri Benc
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jiri Benc
     

28 Oct, 2016

3 commits

  • Now genl_register_family() is the only thing (other than the
    users themselves, perhaps, but I didn't find any doing that)
    writing to the family struct.

    In all families that I found, genl_register_family() is only
    called from __init functions (some indirectly, in which case
    I've add __init annotations to clarifly things), so all can
    actually be marked __ro_after_init.

    This protects the data structure from accidental corruption.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Instead of providing macros/inline functions to initialize
    the families, make all users initialize them statically and
    get rid of the macros.

    This reduces the kernel code size by about 1.6k on x86-64
    (with allyesconfig).

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Static family IDs have never really been used, the only
    use case was the workaround I introduced for those users
    that assumed their family ID was also their multicast
    group ID.

    Additionally, because static family IDs would never be
    reserved by the generic netlink code, using a relatively
    low ID would only work for built-in families that can be
    registered immediately after generic netlink is started,
    which is basically only the control family (apart from
    the workaround code, which I also had to add code for so
    it would reserve those IDs)

    Thus, anything other than GENL_ID_GENERATE is flawed and
    luckily not used except in the cases I mentioned. Move
    those workarounds into a few lines of code, and then get
    rid of GENL_ID_GENERATE entirely, making it more robust.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     

20 Oct, 2016

1 commit


21 Sep, 2016

2 commits


11 Sep, 2016

1 commit


23 Jun, 2016

1 commit

  • The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.")
    introduces packet truncation before sending to userspace upcall receiver.
    This patch passes up the skb->len before truncation so that the upcall
    receiver knows the original packet size. Potentially this will be used
    by sFlow, where OVS translates sFlow config header=N to a sample action,
    truncating packet to N byte in kernel datapath. Thus, only N bytes instead
    of full-packet size is copied from kernel to userspace, saving the
    kernel-to-userspace bandwidth.

    Signed-off-by: William Tu
    Cc: Pravin Shelar
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    William Tu
     

11 Jun, 2016

1 commit

  • The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
    truncate packets. A 'max_len' is added for setting up the maximum
    packet size, and a 'cutlen' field is to record the number of bytes
    to trim the packet when the packet is outputting to a port, or when
    the packet is sent to userspace.

    Signed-off-by: William Tu
    Cc: Pravin Shelar
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    William Tu
     

27 Apr, 2016

1 commit


26 Apr, 2016

1 commit


14 Mar, 2016

1 commit

  • When we want to change a flow using netlink, we have to identify it to
    be able to perform a lookup. Both the flow key and unique flow ID
    (ufid) are valid identifiers, but we always have to specify the flow
    key in the netlink message. When both attributes are there, the ufid
    is used. The flow key is used to validate the actions provided by
    the userland.

    This commit allows to use the ufid without having to provide the flow
    key, as it is already done in the netlink 'flow get' and 'flow del'
    path. The flow key remains mandatory when an action is provided.

    Signed-off-by: Samuel Gauthier
    Reviewed-by: Simon Horman
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Samuel Gauthier
     

02 Mar, 2016

1 commit

  • This patch implements bookkeeping support to compute the maximum
    headroom for all the devices in each datapath. When said value
    changes, the underlying devs are notified via the
    ndo_set_rx_headroom method.

    This also increases the internal vports xmit performance.

    Signed-off-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Paolo Abeni
     

19 Feb, 2016

2 commits


11 Feb, 2016

1 commit

  • Operations with the GENL_ADMIN_PERM flag fail permissions checks because
    this flag means we call netlink_capable, which uses the init user ns.

    Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
    which should be allowed inside a user namespace.

    The motivation for this is to be able to run openvswitch in unprivileged
    containers. I've tested this and it seems to work, but I really have no
    idea about the security consequences of this patch, so thoughts would be
    much appreciated.

    v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
    v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
    massive one

    Reported-by: James Page
    Signed-off-by: Tycho Andersen
    CC: Eric Biederman
    CC: Pravin Shelar
    CC: Justin Pettit
    CC: "David S. Miller"
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tycho Andersen
     

16 Jan, 2016

1 commit

  • Skb_gso_segment() uses skb control block during segmentation.
    This patch adds 32-bytes room for previous control block which
    will be copied into all resulting segments.

    This patch fixes kernel crash during fragmenting forwarded packets.
    Fragmentation requires valid IP CB in skb for clearing ip options.
    Also patch removes custom save/restore in ovs code, now it's redundant.

    Signed-off-by: Konstantin Khlebnikov
    Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     

08 Nov, 2015

1 commit

  • Pull trivial updates from Jiri Kosina:
    "Trivial stuff from trivial tree that can be trivially summed up as:

    - treewide drop of spurious unlikely() before IS_ERR() from Viresh
    Kumar

    - cosmetic fixes (that don't really affect basic functionality of the
    driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

    - various comment / printk fixes and updates all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    bcache: Really show state of work pending bit
    hwmon: applesmc: fix comment typos
    Kconfig: remove comment about scsi_wait_scan module
    class_find_device: fix reference to argument "match"
    debugfs: document that debugfs_remove*() accepts NULL and error values
    net: Drop unlikely before IS_ERR(_OR_NULL)
    mm: Drop unlikely before IS_ERR(_OR_NULL)
    fs: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
    UBI: Update comments to reflect UBI_METAONLY flag
    pktcdvd: drop null test before destroy functions

    Linus Torvalds
     

24 Oct, 2015

1 commit

  • Conflicts:
    net/ipv6/xfrm6_output.c
    net/openvswitch/flow_netlink.c
    net/openvswitch/vport-gre.c
    net/openvswitch/vport-vxlan.c
    net/openvswitch/vport.c
    net/openvswitch/vport.h

    The openvswitch conflicts were overlapping changes. One was
    the egress tunnel info fix in 'net' and the other was the
    vport ->send() op simplification in 'net-next'.

    The xfrm6_output.c conflicts was also a simplification
    overlapping a bug fix.

    Signed-off-by: David S. Miller

    David S. Miller
     

23 Oct, 2015

1 commit

  • While transitioning to netdev based vport we broke OVS
    feature which allows user to retrieve tunnel packet egress
    information for lwtunnel devices. Following patch fixes it
    by introducing ndo operation to get the tunnel egress info.
    Same ndo operation can be used for lwtunnel devices and compat
    ovs-tnl-vport devices. So after adding such device operation
    we can remove similar operation from ovs-vport.

    Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

29 Sep, 2015

1 commit


27 Sep, 2015

1 commit


25 Sep, 2015

1 commit

  • The genl_notify function has too many arguments for no real reason - all
    callers use genl_info to get them anyway. Just pass the genl_info down to
    genl_notify.

    Signed-off-by: Jiri Benc
    Signed-off-by: David S. Miller

    Jiri Benc
     

23 Sep, 2015

1 commit

  • When support for megaflows was introduced, OVS needed to start
    installing flows with a mask applied to them. Since masking is an
    expensive operation, OVS also had an optimization that would only
    take the parts of the flow keys that were covered by a non-zero
    mask. The values stored in the remaining pieces should not matter
    because they are masked out.

    While this works fine for the purposes of matching (which must always
    look at the mask), serialization to netlink can be problematic. Since
    the flow and the mask are serialized separately, the uninitialized
    portions of the flow can be encoded with whatever values happen to be
    present.

    In terms of functionality, this has little effect since these fields
    will be masked out by definition. However, it leaks kernel memory to
    userspace, which is a potential security vulnerability. It is also
    possible that other code paths could look at the masked key and get
    uninitialized data, although this does not currently appear to be an
    issue in practice.

    This removes the mask optimization for flows that are being installed.
    This was always intended to be the case as the mask optimizations were
    really targetting per-packet flow operations.

    Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
    Signed-off-by: Jesse Gross
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Jesse Gross
     

01 Sep, 2015

1 commit

  • Currently tun-info options pointer is used in few cases to
    pass options around. But tunnel options can be accessed using
    ip_tunnel_info_opts() API without using the pointer. Following
    patch removes the redundant pointer and consistently make use
    of API.

    Signed-off-by: Pravin B Shelar
    Acked-by: Thomas Graf
    Reviewed-by: Jesse Gross
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

30 Aug, 2015

1 commit


28 Aug, 2015

1 commit

  • Allow matching and setting the ct_label field. As with ct_mark, this is
    populated by executing the CT action. The label field may be modified by
    specifying a label and mask nested under the CT action. It is stored as
    metadata attached to the connection. Label modification occurs after
    lookup, and will only persist when the conntrack entry is committed by
    providing the COMMIT flag to the CT action. Labels are currently fixed
    to 128 bits in size.

    Signed-off-by: Joe Stringer
    Acked-by: Thomas Graf
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Joe Stringer