13 Mar, 2019

1 commit

  • Patch series "generic radix trees; drop flex arrays".

    This patch (of 7):

    There was no real need for this code to be using flexarrays, it's just
    implementing a hash table - ideally it would be using rhashtables, but
    that conversion would be significantly more complicated.

    Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com
    Signed-off-by: Kent Overstreet
    Reviewed-by: Matthew Wilcox
    Cc: Pravin B Shelar
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Paris
    Cc: Marcelo Ricardo Leitner
    Cc: Neil Horman
    Cc: Paul Moore
    Cc: Shaohua Li
    Cc: Stephen Smalley
    Cc: Vlad Yasevich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

27 Feb, 2019

2 commits

  • The l3proto name is gone, its header file is the last trace.
    While at it, also remove nf_nat_core.h, its very small and all users
    include nf_nat.h too.

    before:
    text data bss dec hex filename
    22948 1612 4136 28696 7018 nf_nat.ko

    after removal of l3proto register/unregister functions:
    text data bss dec hex filename
    22196 1516 4136 27848 6cc8 nf_nat.ko

    checkpatch complains about overly long lines, but line breaks
    do not make things more readable and the line length gets smaller
    here, not larger.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • before:
    text data bss dec hex filename
    16566 1576 4136 22278 5706 nf_nat.ko
    3598 844 0 4442 115a nf_nat_ipv6.ko
    3187 844 0 4031 fbf nf_nat_ipv4.ko

    after:
    text data bss dec hex filename
    22948 1612 4136 28696 7018 nf_nat.ko

    ... with ipv4/v6 nat now provided directly via nf_nat.ko.

    Also changes:
    ret = nf_nat_ipv4_fn(priv, skb, state);
    if (ret != NF_DROP && ret != NF_STOLEN &&
    into
    if (ret != NF_ACCEPT)
    return ret;

    everywhere.

    The nat hooks never should return anything other than
    ACCEPT or DROP (and the latter only in rare error cases).

    The original code uses multi-line ANDing including assignment-in-if:
    if (ret != NF_DROP && ret != NF_STOLEN &&
    !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
    (ct = nf_ct_get(skb, &ctinfo)) != NULL) {

    I removed this while moving, breaking those in separate conditionals
    and moving the assignments into extra lines.

    checkpatch still generates some warnings:
    1. Overly long lines (of moved code).
    Breaking them is even more ugly. so I kept this as-is.
    2. use of extern function declarations in a .c file.
    This is necessary evil, we must call
    nf_nat_l3proto_register() from the nat core now.
    All l3proto related functions are removed later in this series,
    those prototypes are then removed as well.

    v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
    v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
    are removed here.
    v4: also get rid of the assignments in conditionals.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

29 Jan, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for your net-next tree:

    1) Introduce a hashtable to speed up object lookups, from Florian Westphal.

    2) Make direct calls to built-in extension, also from Florian.

    3) Call helper before confirming the conntrack as it used to be originally,
    from Florian.

    4) Call request_module() to autoload br_netfilter when physdev is used
    to relax the dependency, also from Florian.

    5) Allow to insert rules at a given position ID that is internal to the
    batch, from Phil Sutter.

    6) Several patches to replace conntrack indirections by direct calls,
    and to reduce modularization, from Florian. This also includes
    several follow up patches to deal with minor fallout from this
    rework.

    7) Use RCU from conntrack gre helper, from Florian.

    8) GRE conntrack module becomes built-in into nf_conntrack, from Florian.

    9) Replace nf_ct_invert_tuplepr() by calls to nf_ct_invert_tuple(),
    from Florian.

    10) Unify sysctl handling at the core of nf_conntrack, from Florian.

    11) Provide modparam to register conntrack hooks.

    12) Allow to match on the interface kind string, from wenxu.

    13) Remove several exported symbols, not required anymore now after
    a bit of de-modulatization work has been done, from Florian.

    14) Remove built-in map support in the hash extension, this can be
    done with the existing userspace infrastructure, from laura.

    15) Remove indirection to calculate checksums in IPVS, from Matteo Croce.

    16) Use call wrappers for indirection in IPVS, also from Matteo.

    17) Remove superfluous __percpu parameter in nft_counter, patch from
    Luc Van Oostenryck.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Jan, 2019

1 commit


18 Jan, 2019

1 commit


17 Jan, 2019

2 commits

  • One of the more common cases of allocation size calculations is finding the
    size of a structure that has a zero-sized array at the end, along with memory
    for some number of elements for that array. For example:

    struct foo {
    int stuff;
    struct boo entry[];
    };

    instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

    Instead of leaving these open-coded and prone to type mistakes, we can now
    use the new struct_size() helper:

    instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

    This code was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     
  • For nested and variable attributes, the expected length of an attribute
    is not known and marked by a negative number. This results in an OOB
    read when the expected length is later used to check if the attribute is
    all zeros. Fix this by using the actual length of the attribute rather
    than the expected length.

    Signed-off-by: Ross Lagerwall
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Ross Lagerwall
     

05 Jan, 2019

1 commit

  • The previous commit fa642f08839b
    ("openvswitch: Derive IP protocol number for IPv6 later frags")
    introduces IP protocol number parsing for IPv6 later frags that can mess
    up the network header length calculation logic, i.e. nh_len < 0.
    However, the network header length calculation is mainly for deriving
    the transport layer header in the key extraction process which the later
    fragment does not apply.

    Therefore, this commit skips the network header length calculation to
    fix the issue.

    Reported-by: Chris Mi
    Reported-by: Greg Rose
    Fixes: fa642f08839b ("openvswitch: Derive IP protocol number for IPv6 later frags")
    Signed-off-by: Yi-Hung Wei
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     

10 Dec, 2018

1 commit

  • Several conflicts, seemingly all over the place.

    I used Stephen Rothwell's sample resolutions for many of these, if not
    just to double check my own work, so definitely the credit largely
    goes to him.

    The NFP conflict consisted of a bug fix (moving operations
    past the rhashtable operation) while chaning the initial
    argument in the function call in the moved code.

    The net/dsa/master.c conflict had to do with a bug fix intermixing of
    making dsa_master_set_mtu() static with the fixing of the tagging
    attribute location.

    cls_flower had a conflict because the dup reject fix from Or
    overlapped with the addition of port range classifiction.

    __set_phy_supported()'s conflict was relatively easy to resolve
    because Andrew fixed it in both trees, so it was just a matter
    of taking the net-next copy. Or at least I think it was :-)

    Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
    intermixed with changes on how the sdif and caller_net are calculated
    in these code paths in net-next.

    The remaining BPF conflicts were largely about the addition of the
    __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
    to the relevant data structure where the MD pointer macros are used.

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Dec, 2018

1 commit

  • In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_change_flags().

    Therefore extend dev_change_flags() with and extra extack argument and
    update all users. Most of the calls end up just encoding NULL, but
    several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

    Since the function declaration line is changed anyway, name the other
    function arguments to placate checkpatch.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     

01 Dec, 2018

1 commit


11 Nov, 2018

1 commit


09 Nov, 2018

2 commits


04 Nov, 2018

1 commit

  • When CONFIG_CC_OPTIMIZE_FOR_DEBUGGING is enabled, the compiler
    fails to optimize out a dead code path, which leads to a link failure:

    net/openvswitch/conntrack.o: In function `ovs_ct_set_labels':
    conntrack.c:(.text+0x2e60): undefined reference to `nf_connlabels_replace'

    In this configuration, we can take a shortcut, and completely
    remove the contrack label code. This may also help the regular
    optimization.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

01 Nov, 2018

1 commit

  • When there are both pop and push ethernet header actions among the
    actions to be applied to a packet, an unexpected EINVAL (Invalid
    argument) error is obtained. This is due to mac_proto not being reset
    correctly when those actions are validated.

    Reported-at:
    https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html
    Fixes: 91820da6ae85 ("openvswitch: add Ethernet push and pop actions")
    Signed-off-by: Jaime Caamaño Ruiz
    Tested-by: Greg Rose
    Reviewed-by: Greg Rose
    Signed-off-by: David S. Miller

    Jaime Caamaño Ruiz
     

09 Oct, 2018

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for your net-next tree:

    1) Support for matching on ipsec policy already set in the route, from
    Florian Westphal.

    2) Split set destruction into deactivate and destroy phase to make it
    fit better into the transaction infrastructure, also from Florian.
    This includes a patch to warn on imbalance when setting the new
    activate and deactivate interfaces.

    3) Release transaction list from the workqueue to remove expensive
    synchronize_rcu() from configuration plane path. This speeds up
    configuration plane quite a bit. From Florian Westphal.

    4) Add new xfrm/ipsec extension, this new extension allows you to match
    for ipsec tunnel keys such as source and destination address, spi and
    reqid. From Máté Eckl and Florian Westphal.

    5) Add secmark support, this includes connsecmark too, patches
    from Christian Gottsche.

    6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng.
    One follow up patch to calm a clang warning for this one, from
    Nathan Chancellor.

    7) Flush conntrack entries based on layer 3 family, from Kristian Evensen.

    8) New revision for cgroups2 to shrink the path field.

    9) Get rid of obsolete need_conntrack(), as a result from recent
    demodularization works.

    10) Use WARN_ON instead of BUG_ON, from Florian Westphal.

    11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian.

    12) Remove superfluous check for timeout netlink parser and dump
    functions in layer 4 conntrack helpers.

    13) Unnecessary redundant rcu read side locks in NAT redirect,
    from Taehee Yoo.

    14) Pass nf_hook_state structure to error handlers, patch from
    Florian Westphal.

    15) Remove ->new() interface from layer 4 protocol trackers. Place
    them in the ->packet() interface. From Florian.

    16) Place conntrack ->error() handling in the ->packet() interface.
    Patches from Florian Westphal.

    17) Remove unused parameter in the pernet initialization path,
    also from Florian.

    18) Remove additional parameter to specify layer 3 protocol when
    looking up for protocol tracker. From Florian.

    19) Shrink array of layer 4 protocol trackers, from Florian.

    20) Check for linear skb only once from the ALG NAT mangling
    codebase, from Taehee Yoo.

    21) Use rhashtable_walk_enter() instead of deprecated
    rhashtable_walk_init(), also from Taehee.

    22) No need to flush all conntracks when only one single address
    is gone, from Tan Hu.

    23) Remove redundant check for NAT flags in flowtable code, from
    Taehee Yoo.

    24) Use rhashtable_lookup() instead of rhashtable_lookup_fast()
    from netfilter codebase, since rcu read lock side is already
    assumed in this path.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Oct, 2018

1 commit


05 Oct, 2018

1 commit


04 Oct, 2018

1 commit


02 Oct, 2018

1 commit

  • This reverts commit 90c7afc96cbbd77f44094b5b651261968e97de67.

    When the commit was merged, the code used nf_ct_put() to free
    the entry, but later on commit 76644232e612 ("openvswitch: Free
    tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which
    is a more appropriate. Now the original problem is removed.

    Then 44d6e2f27328 ("net: Replace NF_CT_ASSERT() with WARN_ON().")
    replaced a debug assert with a WARN_ON() which is trigged now.

    Signed-off-by: Flavio Leitner
    Acked-by: Joe Stringer
    Signed-off-by: David S. Miller

    Flavio Leitner
     

30 Sep, 2018

1 commit


29 Sep, 2018

1 commit

  • The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
    which is a typedef for an enum type, so make sure the implementation in
    this driver has returns 'netdev_tx_t' value, and change the function
    return type to netdev_tx_t.

    Found by coccinelle.

    Signed-off-by: YueHaibing
    Signed-off-by: David S. Miller

    YueHaibing
     

20 Sep, 2018

1 commit


07 Sep, 2018

1 commit

  • Currently, OVS only parses the IP protocol number for the first
    IPv6 fragment, but sets the IP protocol number for the later fragments
    to be NEXTHDF_FRAGMENT. This patch tries to derive the IP protocol
    number for the IPV6 later frags so that we can match that.

    Signed-off-by: Yi-Hung Wei
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     

03 Aug, 2018

1 commit


30 Jul, 2018

1 commit

  • The meter code would create an entry for each new meter. However, it
    would not set the meter id in the new entry, so every meter would appear
    to have a meter id of zero. This commit properly sets the meter id when
    adding the entry.

    Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
    Signed-off-by: Justin Pettit
    Cc: Andy Zhou
    Signed-off-by: David S. Miller

    Justin Pettit
     

18 Jul, 2018

2 commits

  • IPV6=m
    DEFRAG_IPV6=m
    CONNTRACK=y yields:

    net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get':
    net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable'
    net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6'

    Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params
    ip6_frag_init and ip6_expire_frag_queue so it would be needed to force
    IPV6=y too.

    This patch gets rid of the 'followup linker error' by removing
    the dependency of ipv6.ko symbols from netfilter ipv6 defrag.

    Shared code is placed into a header, then used from both.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The param helper of nf_ct_helper_ext_add is useless now, then remove
    it now.

    Signed-off-by: Gao Feng
    Signed-off-by: Pablo Neira Ayuso

    Gao Feng
     

16 Jul, 2018

1 commit


08 Jul, 2018

1 commit

  • Add 'clone' action to kernel datapath by using existing functions.
    When actions within clone don't modify the current flow, the flow
    key is not cloned before executing clone actions.

    This is a follow up patch for this incomplete work:
    https://patchwork.ozlabs.org/patch/722096/

    v1 -> v2:
    Refactor as advised by reviewer.

    Signed-off-by: Yifeng Sun
    Signed-off-by: Andy Zhou
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yifeng Sun
     

29 Jun, 2018

1 commit

  • Check the tunnel option type stored in tunnel flags when creating options
    for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
    options on interfaces that are not associated with them.

    Make sure all users of the infrastructure set correct flags, for the BPF
    helper we have to set all bits to keep backward compatibility.

    Signed-off-by: Pieter Jansen van Vuuren
    Signed-off-by: Jakub Kicinski
    Signed-off-by: David S. Miller

    Pieter Jansen van Vuuren
     

13 Jun, 2018

2 commits

  • The kzalloc() function has a 2-factor argument form, kcalloc(). This
    patch replaces cases of:

    kzalloc(a * b, gfp)

    with:
    kcalloc(a * b, gfp)

    as well as handling cases of:

    kzalloc(a * b * c, gfp)

    with:

    kzalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kzalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kzalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kzalloc
    + kcalloc
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kzalloc(sizeof(THING) * C2, ...)
    |
    kzalloc(sizeof(TYPE) * C2, ...)
    |
    kzalloc(C1 * C2 * C3, ...)
    |
    kzalloc(C1 * C2, ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kzalloc
    + kcalloc
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     
  • The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
    patch replaces cases of:

    kmalloc(a * b, gfp)

    with:
    kmalloc_array(a * b, gfp)

    as well as handling cases of:

    kmalloc(a * b * c, gfp)

    with:

    kmalloc(array3_size(a, b, c), gfp)

    as it's slightly less ugly than:

    kmalloc_array(array_size(a, b), c, gfp)

    This does, however, attempt to ignore constant size factors like:

    kmalloc(4 * 1024, gfp)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The tools/ directory was manually excluded, since it has its own
    implementation of kmalloc().

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    kmalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    kmalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    kmalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    kmalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_ID)
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_ID
    + COUNT_ID, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (COUNT_CONST)
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * COUNT_CONST
    + COUNT_CONST, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_ID)
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_ID
    + COUNT_ID, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (COUNT_CONST)
    + COUNT_CONST, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * COUNT_CONST
    + COUNT_CONST, sizeof(THING)
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    - kmalloc
    + kmalloc_array
    (
    - SIZE * COUNT
    + COUNT, SIZE
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    kmalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    kmalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    kmalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    kmalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    kmalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    kmalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products,
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(
    - (E1) * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * E3
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - (E1) * (E2) * (E3)
    + array3_size(E1, E2, E3)
    , ...)
    |
    kmalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants,
    // keeping sizeof() as the second factor argument.
    @@
    expression THING, E1, E2;
    type TYPE;
    constant C1, C2, C3;
    @@

    (
    kmalloc(sizeof(THING) * C2, ...)
    |
    kmalloc(sizeof(TYPE) * C2, ...)
    |
    kmalloc(C1 * C2 * C3, ...)
    |
    kmalloc(C1 * C2, ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * (E2)
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(TYPE) * E2
    + E2, sizeof(TYPE)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * (E2)
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - sizeof(THING) * E2
    + E2, sizeof(THING)
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * E2
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - (E1) * (E2)
    + E1, E2
    , ...)
    |
    - kmalloc
    + kmalloc_array
    (
    - E1 * E2
    + E1, E2
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

26 May, 2018

1 commit

  • Currently, nf_conntrack_max is used to limit the maximum number of
    conntrack entries in the conntrack table for every network namespace.
    For the VMs and containers that reside in the same namespace,
    they share the same conntrack table, and the total # of conntrack entries
    for all the VMs and containers are limited by nf_conntrack_max. In this
    case, if one of the VM/container abuses the usage the conntrack entries,
    it blocks the others from committing valid conntrack entries into the
    conntrack table. Even if we can possibly put the VM in different network
    namespace, the current nf_conntrack_max configuration is kind of rigid
    that we cannot limit different VM/container to have different # conntrack
    entries.

    To address the aforementioned issue, this patch proposes to have a
    fine-grained mechanism that could further limit the # of conntrack entries
    per-zone. For example, we can designate different zone to different VM,
    and set conntrack limit to each zone. By providing this isolation, a
    mis-behaved VM only consumes the conntrack entries in its own zone, and
    it will not influence other well-behaved VMs. Moreover, the users can
    set various conntrack limit to different zone based on their preference.

    The proposed implementation utilizes Netfilter's nf_conncount backend
    to count the number of connections in a particular zone. If the number of
    connection is above a configured limitation, ovs will return ENOMEM to the
    userspace. If userspace does not configure the zone limit, the limit
    defaults to zero that is no limitation, which is backward compatible to
    the behavior without this patch.

    The following high leve APIs are provided to the userspace:
    - OVS_CT_LIMIT_CMD_SET:
    * set default connection limit for all zones
    * set the connection limit for a particular zone
    - OVS_CT_LIMIT_CMD_DEL:
    * remove the connection limit for a particular zone
    - OVS_CT_LIMIT_CMD_GET:
    * get the default connection limit for all zones
    * get the connection limit for a particular zone

    Signed-off-by: Yi-Hung Wei
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Yi-Hung Wei
     

12 May, 2018

1 commit

  • The bpf syscall and selftests conflicts were trivial
    overlapping changes.

    The r8169 change involved moving the added mdelay from 'net' into a
    different function.

    A TLS close bug fix overlapped with the splitting of the TLS state
    into separate TX and RX parts. I just expanded the tests in the bug
    fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
    == X".

    Signed-off-by: David S. Miller

    David S. Miller
     

05 May, 2018

1 commit

  • If an OVS_ATTR_NESTED attribute type is found while walking
    through netlink attributes, we call nlattr_set() recursively
    passing the length table for the following nested attributes, if
    different from the current one.

    However, once we're done with those sub-nested attributes, we
    should continue walking through attributes using the current
    table, instead of using the one related to the sub-nested
    attributes.

    For example, given this sequence:

    1 OVS_KEY_ATTR_PRIORITY
    2 OVS_KEY_ATTR_TUNNEL
    3 OVS_TUNNEL_KEY_ATTR_ID
    4 OVS_TUNNEL_KEY_ATTR_IPV4_SRC
    5 OVS_TUNNEL_KEY_ATTR_IPV4_DST
    6 OVS_TUNNEL_KEY_ATTR_TTL
    7 OVS_TUNNEL_KEY_ATTR_TP_SRC
    8 OVS_TUNNEL_KEY_ATTR_TP_DST
    9 OVS_KEY_ATTR_IN_PORT
    10 OVS_KEY_ATTR_SKB_MARK
    11 OVS_KEY_ATTR_MPLS

    we switch to the 'ovs_tunnel_key_lens' table on attribute #3,
    and we don't switch back to 'ovs_key_lens' while setting
    attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS
    evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is
    15, we also get this kind of KASan splat while accessing the
    wrong table:

    [ 7654.586496] ==================================================================
    [ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch]
    [ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430
    [ 7654.610983]
    [ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1
    [ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
    [ 7654.631379] Call Trace:
    [ 7654.634108] [] dump_stack+0x19/0x1b
    [ 7654.639843] [] print_address_description+0x33/0x290
    [ 7654.647129] [] ? nlattr_set+0x164/0xde9 [openvswitch]
    [ 7654.654607] [] kasan_report.part.3+0x242/0x330
    [ 7654.661406] [] __asan_report_load4_noabort+0x34/0x40
    [ 7654.668789] [] nlattr_set+0x164/0xde9 [openvswitch]
    [ 7654.676076] [] ovs_nla_get_match+0x10c8/0x1900 [openvswitch]
    [ 7654.684234] [] ? genl_rcv+0x28/0x40
    [ 7654.689968] [] ? netlink_unicast+0x3f3/0x590
    [ 7654.696574] [] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch]
    [ 7654.705122] [] ? unwind_get_return_address+0xb0/0xb0
    [ 7654.712503] [] ? system_call_fastpath+0x1c/0x21
    [ 7654.719401] [] ? update_stack_state+0x229/0x370
    [ 7654.726298] [] ? update_stack_state+0x229/0x370
    [ 7654.733195] [] ? kasan_unpoison_shadow+0x35/0x50
    [ 7654.740187] [] ? kasan_kmalloc+0xaa/0xe0
    [ 7654.746406] [] ? kasan_slab_alloc+0x12/0x20
    [ 7654.752914] [] ? memset+0x31/0x40
    [ 7654.758456] [] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch]

    [snip]

    [ 7655.132484] The buggy address belongs to the variable:
    [ 7655.138226] ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch]
    [ 7655.145507]
    [ 7655.147166] Memory state around the buggy address:
    [ 7655.152514] ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
    [ 7655.160585] ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
    [ 7655.176701] ^
    [ 7655.184372] ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05
    [ 7655.192431] ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
    [ 7655.200490] ==================================================================

    Reported-by: Hangbin Liu
    Fixes: 982b52700482 ("openvswitch: Fix mask generation for nested attributes.")
    Signed-off-by: Stefano Brivio
    Reviewed-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Stefano Brivio
     

24 Apr, 2018

1 commit

  • This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
    incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)

    Currently DNAT only works for single port or identical port ranges. (i.e.
    ports 5000-5100 on WAN interface redirected to a LAN host while original
    destination port is not altered) When different port ranges are configured,
    either 'random' mode should be used, or else all incoming connections are
    mapped onto the first port in the redirect range. (in described example
    WAN:5000-5100 will all be mapped to 192.168.1.5:2000)

    This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
    which uses a base port value to calculate an offset with the destination port
    present in the incoming stream. That offset is then applied as index within the
    redirect port range (index modulo rangewidth to handle range overflow).

    In described example the base port would be 5000. An incoming stream with
    destination port 5004 would result in an offset value 4 which means that the
    NAT'ed stream will be using destination port 2004.

    Other possibilities include deterministic mapping of larger or multiple ranges
    to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
    51xx)

    This patch does not change any current behavior. It just adds new NAT proto
    range functionality which must be selected via the specific flag when intended
    to use.

    A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
    which makes this functionality immediately available.

    Signed-off-by: Thierry Du Tre
    Signed-off-by: Pablo Neira Ayuso

    Thierry Du Tre
     

30 Mar, 2018

1 commit

  • Here we iterate for_each_net() and removes
    vport from alive net to the exiting net.

    ovs_net::dps are protected by ovs_mutex(),
    and the others, who change it (ovs_dp_cmd_new(),
    __dp_destroy()) also take it.
    The same with datapath::ports list.

    So, we remove rtnl_lock() here.

    Signed-off-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Kirill Tkhai