13 May, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Postpone chain policy update to drop after transaction is complete,
    from Florian Westphal.

    2) Add entry to flowtable after confirmation to fix UDP flows with
    packets going in one single direction.

    3) Reference count leak in dst object, from Taehee Yoo.

    4) Check for TTL field in flowtable datapath, from Taehee Yoo.

    5) Fix h323 conntrack helper due to incorrect boundary check,
    from Jakub Jankowski.

    6) Fix incorrect rcu dereference when fetching basechain stats,
    from Florian Westphal.

    7) Missing error check when adding new entries to flowtable,
    from Taehee Yoo.

    8) Use version field in nfnetlink message to honor the nfgen_family
    field, from Kristian Evensen.

    9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6,
    from Subash Abhinov Kasiviswanathan.

    10) Prevent dying entries from being added to the flowtable,
    from Taehee Yoo.

    11) Don't hit WARN_ON() with malformed blob in ebtables with
    trailing data after last rule, reported by syzbot, patch
    from Florian Westphal.

    12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel
    code.

    13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian
    Westphal.

    This batch comes with a conflict that can be fixed with this patch:

    diff --cc include/uapi/linux/netfilter/nf_tables.h
    index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959
    --- a/include/uapi/linux/netfilter/nf_tables.h
    +++ b/include/uapi/linux/netfilter/nf_tables.h
    @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys
    * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
    * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
    * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
    - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack
    + * @NFT_CT_ID: conntrack id
    */
    enum nft_ct_keys {
    NFT_CT_STATE,
    @@@ -991,6 -993,8 +992,7 @@@
    NFT_CT_DST_IP,
    NFT_CT_SRC_IP6,
    NFT_CT_DST_IP6,
    - NFT_CT_TIMEOUT,
    + NFT_CT_ID,
    __NFT_CT_MAX
    };
    #define NFT_CT_MAX (__NFT_CT_MAX - 1)

    That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer,
    I can also solve this conflict here, just let me know.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

11 May, 2019

1 commit

  • Currently error return from kobject_init_and_add() is not followed by a
    call to kobject_put(). This means there is a memory leak. We currently
    set p to NULL so that kfree() may be called on it as a noop, the code is
    arguably clearer if we move the kfree() up closer to where it is
    called (instead of after goto jump).

    Remove a goto label 'err1' and jump to call to kobject_put() in error
    return from kobject_init_and_add() fixing the memory leak. Re-name goto
    label 'put_back' to 'err1' now that we don't use err1, following current
    nomenclature (err1, err2 ...). Move call to kfree out of the error
    code at bottom of function up to closer to where memory was allocated.
    Add comment to clarify call to kfree().

    Signed-off-by: Tobin C. Harding
    Signed-off-by: David S. Miller

    Tobin C. Harding
     

09 May, 2019

1 commit

  • If userspace provides a rule blob with trailing data after last target,
    we trigger a splat, then convert ruleset to 64bit format (with trailing
    data), then pass that to do_replace_finish() which then returns -EINVAL.

    Erroring out right away avoids the splat plus unneeded translation and
    error unwind.

    Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
    Reported-by: Tetsuo Handa
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

28 Apr, 2019

3 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • This is a simple cleanup addressing two coding style issues found by
    checkpatch.pl in an earlier patch. It's submitted as a separate patch to
    keep the original patch as it was generated by spatch.

    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Michal Kubecek
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

26 Apr, 2019

1 commit


23 Apr, 2019

2 commits

  • When a bridge port is being deleted, do not dereference it later in
    br_vlan_port_event() as it can result in a use-after-free [1] if the RCU
    callback was executed before invoking the function.

    [1]
    [ 129.638551] ==================================================================
    [ 129.646904] BUG: KASAN: use-after-free in br_vlan_port_event+0x53c/0x5fd
    [ 129.654406] Read of size 8 at addr ffff8881e4aa1ae8 by task ip/483
    [ 129.663008] CPU: 0 PID: 483 Comm: ip Not tainted 5.1.0-rc5-custom-02265-ga946bd73daac #1383
    [ 129.672359] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
    [ 129.682484] Call Trace:
    [ 129.685242] dump_stack+0xa9/0x10e
    [ 129.689068] print_address_description.cold.2+0x9/0x25e
    [ 129.694930] kasan_report.cold.3+0x78/0x9d
    [ 129.704420] br_vlan_port_event+0x53c/0x5fd
    [ 129.728300] br_device_event+0x2c7/0x7a0
    [ 129.741505] notifier_call_chain+0xb5/0x1c0
    [ 129.746202] rollback_registered_many+0x895/0xe90
    [ 129.793119] unregister_netdevice_many+0x48/0x210
    [ 129.803384] rtnl_delete_link+0xe1/0x140
    [ 129.815906] rtnl_dellink+0x2a3/0x820
    [ 129.844166] rtnetlink_rcv_msg+0x397/0x910
    [ 129.868517] netlink_rcv_skb+0x137/0x3a0
    [ 129.882013] netlink_unicast+0x49b/0x660
    [ 129.900019] netlink_sendmsg+0x755/0xc90
    [ 129.915758] ___sys_sendmsg+0x761/0x8e0
    [ 129.966315] __sys_sendmsg+0xf0/0x1c0
    [ 129.988918] do_syscall_64+0xa4/0x470
    [ 129.993032] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [ 129.998696] RIP: 0033:0x7ff578104b58
    ...
    [ 130.073811] Allocated by task 479:
    [ 130.077633] __kasan_kmalloc.constprop.5+0xc1/0xd0
    [ 130.083008] kmem_cache_alloc_trace+0x152/0x320
    [ 130.088090] br_add_if+0x39c/0x1580
    [ 130.092005] do_set_master+0x1aa/0x210
    [ 130.096211] do_setlink+0x985/0x3100
    [ 130.100224] __rtnl_newlink+0xc52/0x1380
    [ 130.104625] rtnl_newlink+0x6b/0xa0
    [ 130.108541] rtnetlink_rcv_msg+0x397/0x910
    [ 130.113136] netlink_rcv_skb+0x137/0x3a0
    [ 130.117538] netlink_unicast+0x49b/0x660
    [ 130.121939] netlink_sendmsg+0x755/0xc90
    [ 130.126340] ___sys_sendmsg+0x761/0x8e0
    [ 130.130645] __sys_sendmsg+0xf0/0x1c0
    [ 130.134753] do_syscall_64+0xa4/0x470
    [ 130.138864] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    [ 130.146195] Freed by task 0:
    [ 130.149421] __kasan_slab_free+0x125/0x170
    [ 130.154016] kfree+0xf3/0x310
    [ 130.157349] kobject_put+0x1a8/0x4c0
    [ 130.161363] rcu_core+0x859/0x19b0
    [ 130.165175] __do_softirq+0x250/0xa26
    [ 130.170956] The buggy address belongs to the object at ffff8881e4aa1ae8
    which belongs to the cache kmalloc-1k of size 1024
    [ 130.184972] The buggy address is located 0 bytes inside of
    1024-byte region [ffff8881e4aa1ae8, ffff8881e4aa1ee8)

    Fixes: 9c0ec2e7182a ("bridge: support binding vlan dev link state to vlan member bridge ports")
    Signed-off-by: Ido Schimmel
    Cc: Mike Manning
    Acked-by: Nikolay Aleksandrov
    Acked-by: Mike Manning
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS fixes for net

    The following patchset contains Netfilter/IPVS fixes for your net tree:

    1) Add a selftest for icmp packet too big errors with conntrack, from
    Florian Westphal.

    2) Validate inner header in ICMP error message does not lie to us
    in conntrack, also from Florian.

    3) Initialize ct->timeout to calm down KASAN, from Alexander Potapenko.

    4) Skip ICMP error messages from tunnels in IPVS, from Julian Anastasov.

    5) Use a hash to expose conntrack and expectation ID, from Florian Westphal.

    6) Prevent shift wrap in nft_chain_parse_hook(), from Dan Carpenter.

    7) Fix broken ICMP ID randomization with NAT, also from Florian.

    8) Remove WARN_ON in ebtables compat that is reached via syzkaller,
    from Florian Westphal.

    9) Fix broken timestamps since fb420d5d91c1 ("tcp/fq: move back to
    CLOCK_MONOTONIC"), from Florian.

    10) Fix logging of invalid packets in conntrack, from Andrei Vagin.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

22 Apr, 2019

1 commit

  • It means userspace gave us a ruleset where there is some other
    data after the ebtables target but before the beginning of the next rule.

    Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
    Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

20 Apr, 2019

3 commits

  • If vlan bridge binding is enabled, then the link state of a vlan device
    that is an upper device of the bridge tracks the state of bridge ports
    that are members of that vlan. But this can only be done when the link
    state of the bridge is up. If it is down, then the link state of the
    vlan devices must also be down. This is to maintain existing behavior
    for when STP is enabled and there are no live ports, in which case the
    link state for the bridge and any vlan devices is down.

    Signed-off-by: Mike Manning
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Mike Manning
     
  • If vlan bridge binding is enabled, then the link state of a vlan device
    that is an upper device of the bridge should track the state of bridge
    ports that are members of that vlan. So if a bridge port becomes or
    stops being a member of a vlan, then update the link state of the
    vlan device if necessary.

    Signed-off-by: Mike Manning
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Mike Manning
     
  • In the case of vlan filtering on bridges, the bridge may also have the
    corresponding vlan devices as upper devices. A vlan bridge binding mode
    is added to allow the link state of the vlan device to track only the
    state of the subset of bridge ports that are also members of the vlan,
    rather than that of all bridge ports. This mode is set with a vlan flag
    rather than a bridge sysfs so that the 8021q module is aware that it
    should not set the link state for the vlan device.

    If bridge vlan is configured, the bridge device event handling results
    in the link state for an upper device being set, if it is a vlan device
    with the vlan bridge binding mode enabled. This also sets a
    vlan_bridge_binding flag so that subsequent UP/DOWN/CHANGE events for
    the ports in that bridge result in a link state update of the vlan
    device if required.

    The link state of the vlan device is up if there is at least one bridge
    port that is a vlan member that is admin & oper up, otherwise its oper
    state is IF_OPER_LOWERLAYERDOWN.

    Signed-off-by: Mike Manning
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Mike Manning
     

18 Apr, 2019

1 commit


17 Apr, 2019

2 commits

  • Since the introduction of the vlan_stats_per_port option the netlink
    export of it has been broken since I made a typo and used the ifla
    attribute instead of the bridge option to retrieve its state.
    Sysfs export is fine, only netlink export has been affected.

    Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • When the commit below was introduced it changed two visible things:
    - the skb was no longer passed through the protocol handlers with the
    original device
    - the skb was passed up the stack with skb->dev = bridge

    The first change broke af_packet sockets on bridge ports. For example we
    use them for hostapd which listens for ETH_P_PAE packets on the ports.
    We discussed two possible fixes:
    - create a clone and pass it through NF_HOOK(), act on the original skb
    based on the result
    - somehow signal to the caller from the okfn() that it was called,
    meaning the skb is ok to be passed, which this patch is trying to
    implement via returning 1 from the bridge link-local okfn()

    Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and
    drop/error would return < 0 thus the okfn() is called only when the
    return was 1, so we signal to the caller that it was called by preserving
    the return value from nf_hook().

    Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

16 Apr, 2019

1 commit

  • After merging the netfilter-next tree, today's linux-next build (powerpc
    ppc44x_defconfig) failed like this:

    In file included from net/bridge/br_input.c:19:
    include/net/netfilter/nf_queue.h:16:23: error: field 'state' has incomplete type
    struct nf_hook_state state;
    ^~~~~

    Fixes: 971502d77faa ("bridge: netfilter: unroll NF_HOOK helper in bridge input path")
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Pablo Neira Ayuso

    Stephen Rothwell
     

12 Apr, 2019

4 commits

  • This makes broute a normal ebtables table, hooking at PREROUTING.
    The broute hook is removed.

    It uses skb->cb to signal to bridge rx handler that the skb should be
    routed instead of being bridged.

    This change is backwards compatible with ebtables as no userspace visible
    parts are changed.

    This means we can also remove the !ops test in ebt_register_table,
    it was only there for broute table sake.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Replace NF_HOOK() based invocation of the netfilter hooks with a private
    copy of nf_hook_slow().

    This copy has one difference: it can return the rx handler value expected
    by the stack, i.e. RX_HANDLER_CONSUMED or RX_HANDLER_PASS.

    This is needed by the next patch to invoke the ebtables
    "broute" table via the standard netfilter hooks rather than the custom
    "br_should_route_hook" indirection that is used now.

    When the skb is to be "brouted", we must return RX_HANDLER_PASS from the
    bridge rx input handler, but there is no way to indicate this via
    NF_HOOK(), unless perhaps by some hack such as exposing bridge_cb in the
    netfilter core or a percpu flag.

    text data bss dec filename
    3369 56 0 3425 net/bridge/br_input.o.before
    3458 40 0 3498 net/bridge/br_input.o.after

    This allows removal of the "br_should_route_hook" in the next patch.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • Reduce size of br_input_skb_cb from 24 to 16 bytes by
    using bitfield for those values that can only be 0 or 1.

    igmp is the igmp type value, so it needs to be at least u8.

    Furthermore, the bridge currently relies on step-by-step initialization
    of br_input_skb_cb fields as the skb passes through the stack.

    Explicitly zero out the bridge input cb instead, this avoids having to
    review/validate that no BR_INPUT_SKB_CB(skb)->foo test can see a
    'random' value from previous protocol cb.

    AFAICS all current fields are always set up before they are read again,
    so this is not a bug fix.

    Signed-off-by: Florian Westphal
    Acked-by: David S. Miller
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • br_multicast_start_querier() walks over the port list but it can be
    called from a timer with only multicast_lock held which doesn't protect
    the port list, so use RCU to walk over it.

    Fixes: c83b8fab06fc ("bridge: Restart queries when last querier expires")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

08 Apr, 2019

1 commit

  • This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
    bucket pointer to lock the hash chain for that bucket.

    The benefits of a bit spin_lock are:
    - no need to allocate a separate array of locks.
    - no need to have a configuration option to guide the
    choice of the size of this array
    - locking cost is often a single test-and-set in a cache line
    that will have to be loaded anyway. When inserting at, or removing
    from, the head of the chain, the unlock is free - writing the new
    address in the bucket head implicitly clears the lock bit.
    For __rhashtable_insert_fast() we ensure this always happens
    when adding a new key.
    - even when lockings costs 2 updates (lock and unlock), they are
    in a cacheline that needs to be read anyway.

    The cost of using a bit spin_lock is a little bit of code complexity,
    which I think is quite manageable.

    Bit spin_locks are sometimes inappropriate because they are not fair -
    if multiple CPUs repeatedly contend of the same lock, one CPU can
    easily be starved. This is not a credible situation with rhashtable.
    Multiple CPUs may want to repeatedly add or remove objects, but they
    will typically do so at different buckets, so they will attempt to
    acquire different locks.

    As we have more bit-locks than we previously had spinlocks (by at
    least a factor of two) we can expect slightly less contention to
    go with the slightly better cache behavior and reduced memory
    consumption.

    To enhance type checking, a new struct is introduced to represent the
    pointer plus lock-bit
    that is stored in the bucket-table. This is "struct rhash_lock_head"
    and is empty. A pointer to this needs to be cast to either an
    unsigned lock, or a "struct rhash_head *" to be useful.
    Variables of this type are most often called "bkt".

    Previously "pprev" would sometimes point to a bucket, and sometimes a
    ->next pointer in an rhash_head. As these are now different types,
    pprev is NULL when it would have pointed to the bucket. In that case,
    'blk' is used, together with correct locking protocol.

    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

06 Apr, 2019

1 commit


05 Apr, 2019

4 commits

  • Since the mcast conversion to rhashtable this function has been unused, so
    remove it.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We need to be careful and always zero the whole br_ip struct when it is
    used for matching since the rhashtable change. This patch fixes all the
    places which didn't properly clear it which in turn might've caused
    mismatches.

    Thanks for the great bug report with reproducing steps and bisection.

    Steps to reproduce (from the bug report):
    ip link add br0 type bridge mcast_querier 1
    ip link set br0 up

    ip link add v2 type veth peer name v3
    ip link set v2 master br0
    ip link set v2 up
    ip link set v3 up
    ip addr add 3.0.0.2/24 dev v3

    ip netns add test
    ip link add v1 type veth peer name v1 netns test
    ip link set v1 master br0
    ip link set v1 up
    ip -n test link set v1 up
    ip -n test addr add 3.0.0.1/24 dev v1

    # Multicast receiver
    ip netns exec test socat
    UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -

    # Multicast sender
    echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588

    Reported-by: liam.mcbirnie@boeing.com
    Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We can optimize the fdb convergence when a backup_port is present by not
    immediately flushing the entries of the stopped port since traffic for
    those entries will flow towards the backup_port.

    There are 2 cases specifically that benefit most:
    - when the stopped port comes up before the entries expire by themselves
    - when there's an external entry refresh and they're kept while the
    backup_port is operating (e.g. mlag)

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Simplify this code by updating bridge multicast stats from
    maybe_deliver().

    Note that commit 6db6f0eae605 ("bridge: multicast to unicast"), in case
    the port flag BR_MULTICAST_TO_UNICAST is set, never updates the previous
    port pointer, therefore it is always going to be different from the
    existing port in this deduplicated list iteration.

    Signed-off-by: Pablo Neira Ayuso
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

30 Mar, 2019

2 commits


28 Mar, 2019

1 commit


21 Mar, 2019

1 commit


18 Mar, 2019

1 commit

  • Since Commit 21d1196a35f5 ("ipv4: set transport header earlier"),
    skb->transport_header has been always set before entering INET
    netfilter. This patch is to set skb->transport_header for bridge
    before entering INET netfilter by bridge-nf-call-iptables.

    It also fixes an issue that sctp_error() couldn't compute a right
    csum due to unset skb->transport_header.

    Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code")
    Reported-by: Li Shuang
    Suggested-by: Pablo Neira Ayuso
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Xin Long
     

03 Mar, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter/IPVS updates for net-next

    The following patchset contains Netfilter/IPVS updates for net-next:

    1) Add .release_ops to properly unroll .select_ops, use it from nft_compat.
    After this change, we can remove list of extensions too to simplify this
    codebase.

    2) Update amanda conntrack helper to support v3.4, from Florian Tham.

    3) Get rid of the obsolete BUGPRINT macro in ebtables, from
    Florian Westphal.

    4) Merge IPv4 and IPv6 masquerading infrastructure into one single module.
    From Florian Westphal.

    5) Patchset to remove nf_nat_l3proto structure to get rid of
    indirections, from Florian Westphal.

    6) Skip unnecessary conntrack timeout updates in case the value is
    still the same, also from Florian Westphal.

    7) Remove unnecessary 'fall through' comments in empty switch cases,
    from Li RongQing.

    8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys.

    9) Incorrect logic to deactivate path of fixed size hashtable sets,
    element was being tested to self.

    10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit
    keys.

    11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi.

    12) Enter close state in conntrack if RST matches exact sequence number,
    from Florian Westphal.

    13) Initialize dst_cache in tunnel extension, from wenxu.

    14) Pass protocol as u16 to xt_check_match and xt_check_target, from
    Li RongQing.

    15) SCTP header is granted to be in a linear area from IPVS NAT handler,
    from Xin Long.

    16) Don't steal packets coming from slave VRF device from the
    ip_sabotage_in() path, from David Ahern.

    17) Fix unsafe update of basechain stats, from Li RongQing.

    18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize
    modulo operation as bitwise AND, from Li RongQing.

    19) Use device_attribute instead of internal definition in the IDLETIMER
    target, from Sami Tolvanen.

    20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

01 Mar, 2019

2 commits

  • Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook
    calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb
    device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in
    to not drop packets for slave devices too. Currently, neighbor
    solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting
    dropped. This patch enables IPv6 communications for bridges with an
    address that are enslaved to a VRF.

    Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
    Signed-off-by: David Ahern
    Signed-off-by: Pablo Neira Ayuso

    David Ahern
     
  • The proto in struct xt_match and struct xt_target is u16, when
    calling xt_check_target/match, their proto argument is u8,
    and will cause truncation, it is harmless to ip packet, since
    ip proto is u8

    if a etable's match/target has proto that is u16, will cause
    the check failure.

    and convert be16 to short in bridge/netfilter/ebtables.c

    Signed-off-by: Zhang Yu
    Signed-off-by: Li RongQing
    Signed-off-by: Pablo Neira Ayuso

    Li RongQing
     

28 Feb, 2019

1 commit

  • Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field
    from all clients, which were migrated to use switchdev notification in
    the previous patches.

    Add a new function switchdev_port_attr_notify() that sends the switchdev
    notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process)
    notifier chain.

    We have one odd case within net/bridge/br_switchdev.c with the
    SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that
    requires executing from atomic context, we deal with that one
    specifically.

    Drop __switchdev_port_attr_set() and update switchdev_port_attr_set()
    likewise.

    Signed-off-by: Florian Fainelli
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Florian Fainelli
     

27 Feb, 2019

1 commit


25 Feb, 2019

1 commit

  • Three conflicts, one of which, for marvell10g.c is non-trivial and
    requires some follow-up from Heiner or someone else.

    The issue is that Heiner converted the marvell10g driver over to
    use the generic c45 code as much as possible.

    However, in 'net' a bug fix appeared which makes sure that a new
    local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
    is cleared.

    Signed-off-by: David S. Miller

    David S. Miller
     

24 Feb, 2019

1 commit

  • This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list
    when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net:
    bridge: remove ipv6 zero address check in mcast queries")

    The reason is RFC 4541 is not a standard but suggestive. Currently we
    will elect 0.0.0.0 as Querier if there is no ip address configured on
    bridge. If we do not add the port which recives query with source
    0.0.0.0 to router list, the IGMP reports will not be about to forward
    to Querier, IGMP data will also not be able to forward to dest.

    As Nikolay suggested, revert this change first and add a boolopt api
    to disable none-zero election in future if needed.

    Reported-by: Linus Lüssing
    Reported-by: Sebastian Gottschall
    Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
    Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries")
    Signed-off-by: Hangbin Liu
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Hangbin Liu
     

22 Feb, 2019

1 commit

  • Now that all switchdev drivers have been converted to check the
    SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS flags and report flags that they
    do not support accordingly, we can migrate the bridge code to try to set
    that attribute first, check the results and then do the actual setting.

    Signed-off-by: Florian Fainelli
    Reviewed-by: Ido Schimmel
    Acked-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Florian Fainelli