05 Dec, 2020

1 commit

  • Fix to return a negative error code from the error handling
    case instead of 0, as done elsewhere in this function.

    Changing 'return start' to 'return action_start' can fix this bug.

    Fixes: 69929d4c49e1 ("net: openvswitch: fix TTL decrement action netlink message format")
    Reported-by: Hulk Robot
    Signed-off-by: Wang Hai
    Reviewed-by: Eelco Chaudron
    Link: https://lore.kernel.org/r/20201204114314.1596-1-wanghai38@huawei.com
    Signed-off-by: Jakub Kicinski

    Wang Hai
     

04 Dec, 2020

1 commit

  • when openvswitch is configured to mangle the LSE, the current value is
    read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that
    the label is contained in the skb "linear" area.

    Found by code inspection.

    Fixes: d27cf5c59a12 ("net: core: add MPLS update core helper and use in OvS")
    Signed-off-by: Davide Caratti
    Link: https://lore.kernel.org/r/aa099f245d93218b84b5c056b67b6058ccf81a66.1606987185.git.dcaratti@redhat.com
    Signed-off-by: Jakub Kicinski

    Davide Caratti
     

28 Nov, 2020

1 commit

  • Currently, the openvswitch module is not accepting the correctly formated
    netlink message for the TTL decrement action. For both setting and getting
    the dec_ttl action, the actions should be nested in the
    OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.

    When the original patch was sent, it was tested with a private OVS userspace
    implementation. This implementation was unfortunately not upstreamed and
    reviewed, hence an erroneous version of this patch was sent out.

    Leaving the patch as-is would cause problems as the kernel module could
    interpret additional attributes as actions and vice-versa, due to the
    actions not being encapsulated/nested within the actual attribute, but
    being concatinated after it.

    Fixes: 744676e77720 ("openvswitch: add TTL decrement action")
    Signed-off-by: Eelco Chaudron
    Link: https://lore.kernel.org/r/160622121495.27296.888010441924340582.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
    Signed-off-by: Jakub Kicinski

    Eelco Chaudron
     

04 Nov, 2020

1 commit

  • Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize()
    by replacing rcu_dereference() with rcu_dereference_ovsl().

    In addition, when creating a new datapath, make sure it's configured under
    the ovs_lock.

    Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
    Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com
    Signed-off-by: Eelco Chaudron
    Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild
    Signed-off-by: Jakub Kicinski

    Eelco Chaudron
     

19 Oct, 2020

1 commit

  • The flow_lookup() function uses per CPU variables, which must be called
    with BH disabled. However, this is fine in the general NAPI use case
    where the local BH is disabled. But, it's also called from the netlink
    context. The below patch makes sure that even in the netlink path, the
    BH is disabled.

    In addition, u64_stats_update_begin() requires a lock to ensure one writer
    which is not ensured here. Making it per-CPU and disabling NAPI (softirq)
    ensures that there is always only one writer.

    Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
    Reported-by: Juri Lelli
    Signed-off-by: Eelco Chaudron
    Link: https://lore.kernel.org/r/160295903253.7789.826736662555102345.stgit@ebuild
    Signed-off-by: Jakub Kicinski

    Eelco Chaudron
     

14 Oct, 2020

1 commit


09 Oct, 2020

2 commits

  • Small conflict around locking in rxrpc_process_event() -
    channel_lock moved to bundle in next, while state lock
    needs _bh() from net.

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     
  • With multiple DNAT rules it's possible that after destination
    translation the resulting tuples collide.

    For example, two openvswitch flows:
    nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
    nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))

    Assuming two TCP clients initiating the following connections:
    10.0.0.10:5000->10.0.0.10:10
    10.0.0.10:5000->10.0.0.20:10

    Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
    nf_conntrack_confirm() to fail because of tuple collision.

    Netfilter handles this case by allocating a null binding for SNAT at
    egress by default. Perform the same operation in openvswitch for DNAT
    if no explicit SNAT is requested by the user and allocate a null binding
    for SNAT for packets in the "original" direction.

    Reported-at: https://bugzilla.redhat.com/1877128
    Suggested-by: Florian Westphal
    Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
    Signed-off-by: Dumitru Ceara
    Signed-off-by: Jakub Kicinski

    Dumitru Ceara
     

06 Oct, 2020

1 commit


05 Oct, 2020

1 commit


04 Oct, 2020

1 commit

  • Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
    respectively pop and push a base Ethernet header at the beginning of a
    frame.

    POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
    must be stripped before calling POP_ETH.

    PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
    addresses can be configured. The Ethertype is automatically set from
    skb->protocol. These restrictions ensure that all skb's fields remain
    consistent, so that this action can't confuse other part of the
    networking stack (like GSO).

    Since openvswitch already had these actions, consolidate the code in
    skbuff.c (like for vlan and mpls push/pop).

    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     

03 Oct, 2020

1 commit


19 Sep, 2020

1 commit


05 Sep, 2020

1 commit

  • We got slightly different patches removing a double word
    in a comment in net/ipv4/raw.c - picked the version from net.

    Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached
    values instead of VNIC login response buffer (following what
    commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login
    response buffer") did).

    Signed-off-by: Jakub Kicinski

    Jakub Kicinski
     

02 Sep, 2020

4 commits

  • If nf_conncount_init fails currently the dispatched work is not canceled,
    causing problems when the timer fires. This change fixes this by not
    scheduling the work until all initialization is successful.

    Fixes: a65878d6f00b ("net: openvswitch: fixes potential deadlock in dp cleanup code")
    Reported-by: kernel test robot
    Signed-off-by: Eelco Chaudron
    Reviewed-by: Tonghao Zhang
    Signed-off-by: David S. Miller

    Eelco Chaudron
     
  • keep_flows was introduced by [1], which used as flag to delete flows or not.
    When rehashing or expanding the table instance, we will not flush the flows.
    Now don't use it anymore, remove it.

    [1] - https://github.com/openvswitch/ovs/commit/acd051f1761569205827dc9b037e15568a8d59f8
    Cc: Pravin B Shelar
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Decrease table->count and ufid_count unconditionally,
    because we only don't use count or ufid_count to count
    when flushing the flows. To simplify the codes, we
    remove the "count" argument of table_instance_flow_free.

    To avoid a bug when deleting flows in the future, add
    WARN_ON in flush flows function.

    Cc: Pravin B Shelar
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Not change the logic, just improve the coding style.

    Cc: Pravin B Shelar
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

01 Sep, 2020

1 commit


24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

14 Aug, 2020

1 commit

  • To avoid some issues, for example RCU usage warning and double free,
    we should flush the flows under ovs_lock. This patch refactors
    table_instance_destroy and introduces table_instance_flow_flush
    which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

    Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table")
    Reported-by: Johan Knöös
    Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
    Signed-off-by: Tonghao Zhang
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

06 Aug, 2020

1 commit

  • ovs_flow_tbl_destroy always is called from RCU callback
    or error path. It is no need to check if rcu_read_lock
    or lockdep_ovsl_is_held was held.

    ovs_dp_cmd_fill_info always is called with ovs_mutex,
    So use the rcu_dereference_ovsl instead of rcu_dereference
    in ovs_flow_tbl_masks_cache_size.

    Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable")
    Cc: Eelco Chaudron
    Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com
    Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com
    Signed-off-by: Tonghao Zhang
    Acked-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

04 Aug, 2020

4 commits

  • This patch makes the masks cache size configurable, or with
    a size of 0, disable it.

    Reviewed-by: Paolo Abeni
    Reviewed-by: Tonghao Zhang
    Signed-off-by: Eelco Chaudron
    Signed-off-by: David S. Miller

    Eelco Chaudron
     
  • Add a counter that counts the number of masks cache hits, and
    export it through the megaflow netlink statistics.

    Reviewed-by: Paolo Abeni
    Reviewed-by: Tonghao Zhang
    Signed-off-by: Eelco Chaudron
    Signed-off-by: David S. Miller

    Eelco Chaudron
     
  • ovs_ct_put_key() is potentially copying uninitialized kernel stack memory
    into socket buffers, since the compiler may leave a 3-byte hole at the end
    of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix
    it by initializing `orig` with memset().

    Fixes: 9dd7f8907c37 ("openvswitch: Add original direction conntrack tuple to sw_flow_key.")
    Suggested-by: Dan Carpenter
    Signed-off-by: Peilin Ye
    Signed-off-by: David S. Miller

    Peilin Ye
     
  • When openvswitch conntrack offload with act_ct action. Fragment packets
    defrag in the ingress tc act_ct action and miss the next chain. Then the
    packet pass to the openvswitch datapath without the mru. The over
    mtu packet will be dropped in output action in openvswitch for over mtu.

    "kernel: net2: dropped over-mtu packet: 1528 > 1500"

    This patch add mru in the tc_skb_ext for adefrag and miss next chain
    situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
    to the qdisc_skb_cb when the packet defrag. And When the chain miss,
    The mru is set to tc_skb_ext which can be got by ovs datapath.

    Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
    Signed-off-by: wenxu
    Reviewed-by: Cong Wang
    Signed-off-by: David S. Miller

    wenxu
     

25 Jul, 2020

1 commit

  • The previous patch introduced a deadlock, this patch fixes it by making
    sure the work is canceled without holding the global ovs lock. This is
    done by moving the reorder processing one layer up to the netns level.

    Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage")
    Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com
    Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com
    Reviewed-by: Paolo
    Signed-off-by: Eelco Chaudron
    Signed-off-by: David S. Miller

    Eelco Chaudron
     

18 Jul, 2020

1 commit

  • This patch reorders the masks array every 4 seconds based on their
    usage count. This greatly reduces the masks per packet hit, and
    hence the overall performance. Especially in the OVS/OVN case for
    OpenShift.

    Here are some results from the OVS/OVN OpenShift test, which use
    8 pods, each pod having 512 uperf connections, each connection
    sends a 64-byte request and gets a 1024-byte response (TCP).
    All uperf clients are on 1 worker node while all uperf servers are
    on the other worker node.

    Kernel without this patch : 7.71 Gbps
    Kernel with this patch applied: 14.52 Gbps

    We also run some tests to verify the rebalance activity does not
    lower the flow insertion rate, which does not.

    Signed-off-by: Eelco Chaudron
    Tested-by: Andrew Theurer
    Reviewed-by: Paolo Abeni
    Signed-off-by: David S. Miller

    Eelco Chaudron
     

14 Jul, 2020

1 commit


25 Jun, 2020

1 commit

  • ovs connection tracking module performs de-fragmentation on incoming
    fragmented traffic. Take info account if traffic has been de-fragmented
    in execute_check_pkt_len action otherwise we will perform the wrong
    nested action considering the original packet size. This issue typically
    occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
    tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
    Moreover take into account GSO fragment size for GSO packet in
    execute_check_pkt_len routine

    Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len")
    Signed-off-by: Lorenzo Bianconi
    Signed-off-by: David S. Miller

    Lorenzo Bianconi
     

14 Jun, 2020

1 commit

  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

26 Apr, 2020

3 commits


24 Apr, 2020

5 commits

  • When setting the meter rate to 4+Gbps, there is an
    overflow, the meters don't work as expected.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Before invoking the ovs_meter_cmd_reply_stats, "meter"
    was checked, so don't check it agin in that function.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • Don't allow user to create meter unlimitedly, which may cause
    to consume a large amount of kernel memory. The max number
    supported is decided by physical memory and 20K meters as default.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     
  • In kernel datapath of Open vSwitch, there are only 1024
    buckets of meter in one datapath. If installing more than
    1024 (e.g. 8192) meters, it may lead to the performance drop.
    But in some case, for example, Open vSwitch used as edge
    gateway, there should be 20K at least, where meters used for
    IP address bandwidth limitation.

    [Open vSwitch userspace datapath has this issue too.]

    For more scalable meter, this patch use meter array instead of
    hash tables, and expand/shrink the array when necessary. So we
    can install more meters than before in the datapath.
    Introducing the struct *dp_meter_instance, it's easy to
    expand meter though changing the *ti point in the struct
    *dp_meter_table.

    Cc: Pravin B Shelar
    Cc: Andy Zhou
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang
     

21 Apr, 2020

1 commit

  • syzbot wrote:
    | =============================
    | WARNING: suspicious RCU usage
    | 5.7.0-rc1+ #45 Not tainted
    | -----------------------------
    | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
    |
    | other info that might help us debug this:
    | rcu_scheduler_active = 2, debug_locks = 1
    | ...
    |
    | stack backtrace:
    | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
    | Workqueue: netns cleanup_net
    | Call Trace:
    | ...
    | ovs_ct_exit
    | ovs_exit_net
    | ops_exit_list.isra.7
    | cleanup_net
    | process_one_work
    | worker_thread

    To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
    lockdep_ovsl_is_held as optional lockdep expression.

    Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
    Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
    Cc: Pravin B Shelar
    Cc: Yi-Hung Wei
    Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
    Signed-off-by: Tonghao Zhang
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Tonghao Zhang