20 Mar, 2020

3 commits

  • Now that we have a nested tunnel info attribute we can add a separate
    one for the tunnel command and require it explicitly from user-space. It
    must be one of RTM_SETLINK/DELLINK. Only RTM_SETLINK requires a valid
    tunnel id, DELLINK just removes it if it was set before. This allows us
    to have all tunnel attributes and control in one place, thus removing
    the need for an outside vlan info flag.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • While discussing the new API, Roopa mentioned that we'll be adding more
    tunnel attributes and options in the future, so it's better to make it a
    nested attribute, since this is still in net-next we can easily change it
    and nest the tunnel id attribute under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.

    The new format is:
    [BRIDGE_VLANDB_ENTRY]
    [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO]
    [BRIDGE_VLANDB_TINFO_ID]

    Any new tunnel attributes can be nested under
    BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.

    Suggested-by: Roopa Prabhu
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds support for vlan stats to be included when dumping vlan
    information. We have to dump them only when explicitly requested (thus the
    flag below) because that disables the vlan range compression and will make
    the dump significantly larger. In order to request the stats to be
    included we add a new dump attribute called BRIDGE_VLANDB_DUMP_FLAGS which
    can affect dumps with the following first flag:
    - BRIDGE_VLANDB_DUMPF_STATS
    The stats are intentionally nested and put into separate attributes to make
    it easier for extending later since we plan to add per-vlan mcast stats,
    drop stats and possibly STP stats. This is the last missing piece from the
    new vlan API which makes the dumped vlan information complete.

    A dump request which should include stats looks like:
    [BRIDGE_VLANDB_DUMP_FLAGS] |= BRIDGE_VLANDB_DUMPF_STATS

    A vlandb entry attribute with stats looks like:
    [BRIDGE_VLANDB_ENTRY] = {
    [BRIDGE_VLANDB_ENTRY_STATS] = {
    [BRIDGE_VLANDB_STATS_RX_BYTES]
    [BRIDGE_VLANDB_STATS_RX_PACKETS]
    ...
    }
    }

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

18 Mar, 2020

5 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter updates for net-next

    The following patchset contains Netfilter updates for net-next:

    1) Use nf_flow_offload_tuple() to fetch flow stats, from Paul Blakey.

    2) Add new xt_IDLETIMER hard mode, from Manoj Basapathi.
    Follow up patch to clean up this new mode, from Dan Carpenter.

    3) Add support for geneve tunnel options, from Xin Long.

    4) Make sets built-in and remove modular infrastructure for sets,
    from Florian Westphal.

    5) Remove unused TEMPLATE_NULLS_VAL, from Li RongQing.

    6) Statify nft_pipapo_get, from Chen Wandun.

    7) Use C99 flexible-array member, from Gustavo A. R. Silva.

    8) More descriptive variable names for bitwise, from Jeremy Sowden.

    9) Four patches to add tunnel device hardware offload to the flowtable
    infrastructure, from wenxu.

    10) pipapo set supports for 8-bit grouping, from Stefano Brivio.

    11) pipapo can switch between nibble and byte grouping, also from
    Stefano.

    12) Add AVX2 vectorized version of pipapo, from Stefano Brivio.

    13) Update pipapo to be use it for single ranges, from Stefano.

    14) Add stateful expression support to elements via control plane,
    eg. counter per element.

    15) Re-visit sysctls in unprivileged namespaces, from Florian Westphal.

    15) Add new egress hook, from Lukas Wunner.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This patch adds support for manipulating vlan/tunnel mappings. The
    tunnel ids are globally unique and are one per-vlan. There were two
    trickier issues - first in order to support vlan ranges we have to
    compute the current tunnel id in the following way:
    - base tunnel id (attr) + current vlan id - starting vlan id
    This is in line how the old API does vlan/tunnel mapping with ranges. We
    already have the vlan range present, so it's redundant to add another
    attribute for the tunnel range end. It's simply base tunnel id + vlan
    range. And second to support removing mappings we need an out-of-band way
    to tell the option manipulating function because there are no
    special/reserved tunnel id values, so we use a vlan flag to denote the
    operation is tunnel mapping removal.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add a new option - BRIDGE_VLANDB_ENTRY_TUNNEL_ID which is used to dump
    the tunnel id mapping. Since they're unique per vlan they can enter a
    vlan range if they're consecutive, thus we can calculate the tunnel id
    range map simply as: vlan range end id - vlan range start id. The
    starting point is the tunnel id in BRIDGE_VLANDB_ENTRY_TUNNEL_ID. This
    is similar to how the tunnel entries can be created in a range via the
    old API (a vlan range maps to a tunnel range).

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • The vlan tunnel code changes vlan options, it shouldn't touch port or
    bridge options so we can constify the port argument. This would later help
    us to re-use these functions from the vlan options code.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • It is more appropriate name as it shows the intent of why we need to
    check the options' state. It also allows us to give meaning to the two
    arguments of the function: the first is the current vlan (v_curr) being
    checked if it could enter the range ending in the second one (range_end).

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

15 Mar, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    Lastly, fix checkpatch.pl warning
    WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
    in net/bridge/netfilter/ebtables.c

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Pablo Neira Ayuso

    Gustavo A. R. Silva
     

25 Feb, 2020

1 commit

  • In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
    if the packet has the vlan header inside (e.g. bridge with disabled
    tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
    to extract the vid before filtering which in turn calls pskb_may_pull()
    and we may end up with a stale eth pointer. Moreover the cached eth header
    pointer will generally be wrong after that operation. Remove the eth header
    caching and just use eth_hdr() directly, the compiler does the right thing
    and calculates it only once so we don't lose anything.

    Fixes: 057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

20 Feb, 2020

1 commit


24 Jan, 2020

4 commits

  • The first per-vlan option added is state, it is needed for EVPN and for
    per-vlan STP. The state allows to control the forwarding on per-vlan
    basis. The vlan state is considered only if the port state is forwarding
    in order to avoid conflicts and be consistent. br_allowed_egress is
    called only when the state is forwarding, but the ingress case is a bit
    more complicated due to the fact that we may have the transition between
    port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still
    allow the bridge to learn from the packet after vlan filtering and it will
    be dropped after that. Also to optimize the pvid state check we keep a
    copy in the vlan group to avoid one lookup. The state members are
    modified with *_ONCE() to annotate the lockless access.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds support for option modification of single vlans and
    ranges. It allows to only modify options, i.e. skip create/delete by
    using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range
    option changes we try to pack the notifications as much as possible.

    v2: do full port (all vlans) notification only when creating/deleting
    vlans for compatibility, rework the range detection when changing
    options, add more verbose extack errors and check if a vlan should
    be used (br_vlan_should_use checks)

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We'll be dumping the options for the whole range if they're equal. The
    first range vlan will be used to extract the options. The commit doesn't
    change anything yet it just adds the skeleton for the support. The dump
    will happen when the first option is added.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • If we make sure that br_allowed_egress is called only when we have
    BR_STATE_FORWARDING state then we can avoid a test later when we add
    per-vlan state.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

15 Jan, 2020

8 commits

  • Now that we can notify, send a notification on add/del or change of flags.
    Notifications are also compressed when possible to reduce their number
    and relieve user-space of extra processing, due to that we have to
    manually notify after each add/del in order to avoid double
    notifications. We try hard to notify only about the vlans which actually
    changed, thus a single command can result in multiple notifications
    about disjoint ranges if there were vlans which didn't change inside.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN
    and add support for sending vlan notifications (both single and ranges).
    No functional changes intended, the notification support will be used by
    later patches.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes
    RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress
    similar vlans into ranges. This will be also used when per-vlan options
    are introduced and vlans' options match, they will be put into a single
    range which is encapsulated in one netlink attribute. We need to run
    similar checks as br_process_vlan_info() does because these ranges will
    be used for options setting and they'll be able to skip
    br_process_vlan_info().

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to
    map DELVLAN to DELLINK and register the handler.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add initial RTM_NEWVLAN support which can only create vlans, operating
    similar to the current br_afspec(). We will use it later to also change
    per-vlan options. Old-style (flag-based) vlan ranges are not allowed
    when using RTM messages, we will introduce vlan ranges later via a new
    nested attribute which would allow us to have all the information about a
    range encapsulated into a single nl attribute.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • This patch adds vlan rtm definitions:
    - NEWVLAN: to be used for creating vlans, setting options and
    notifications
    - DELVLAN: to be used for deleting vlans
    - GETVLAN: used for dumping vlan information

    Dumping vlans which can span multiple messages is added now with basic
    information (vid and flags). We use nlmsg_parse() to validate the header
    length in order to be able to extend the message with filtering
    attributes later.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add extack messages on vlan processing errors. We need to move the flags
    missing check after the "last" check since we may have "last" set but
    lack a range end flag in the next entry.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add helpers to check if a vlan id or range are valid. The range helper
    must be called when range start or end are detected.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

01 Jan, 2020

1 commit


27 Dec, 2019

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Fix endianness issue in flowtable TCP flags dissector,
    from Arnd Bergmann.

    2) Extend flowtable test script with dnat rules, from Florian Westphal.

    3) Reject padding in ebtables user entries and validate computed user
    offset, reported by syzbot, from Florian Westphal.

    4) Fix endianness in nft_tproxy, from Phil Sutter.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

25 Dec, 2019

1 commit

  • The MTU update code is supposed to be invoked in response to real
    networking events that update the PMTU. In IPv6 PMTU update function
    __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
    confirmed time.

    But for tunnel code, it will call pmtu before xmit, like:
    - tnl_update_pmtu()
    - skb_dst_update_pmtu()
    - ip6_rt_update_pmtu()
    - __ip6_rt_update_pmtu()
    - dst_confirm_neigh()

    If the tunnel remote dst mac address changed and we still do the neigh
    confirm, we will not be able to update neigh cache and ping6 remote
    will failed.

    So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
    should not be invoking dst_confirm_neigh() as we have no evidence
    of successful two-way communication at this point.

    On the other hand it is also important to keep the neigh reachability fresh
    for TCP flows, so we cannot remove this dst_confirm_neigh() call.

    To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
    to choose whether we should do neigh update or not. I will add the parameter
    in this patch and set all the callers to true to comply with the previous
    way, and fix the tunnel code one by one on later patches.

    v5: No change.
    v4: No change.
    v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
    v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

    Suggested-by: David Miller
    Reviewed-by: Guillaume Nault
    Acked-by: David Ahern
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

23 Dec, 2019

2 commits

  • Mere overlapping changes in the conflicts here.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Pull networking fixes from David Miller:

    1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
    including adding a missing ipv6 match description.

    2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
    Bhat.

    3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.

    4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.

    5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.

    6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
    Chaignon.

    7) Multicast MAC limit test is off by one in qede, from Manish Chopra.

    8) Fix established socket lookup race when socket goes from
    TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
    RCU grace period. From Eric Dumazet.

    9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.

    10) Fix active backup transition after link failure in bonding, from
    Mahesh Bandewar.

    11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.

    12) Fix wrong interface passed to ->mac_link_up(), from Russell King.

    13) Fix DSA egress flooding settings in b53, from Florian Fainelli.

    14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.

    15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.

    16) Reject invalid MTU values in stmmac, from Jose Abreu.

    17) Fix refcount leak in error path of u32 classifier, from Davide
    Caratti.

    18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
    Kaseorg.

    19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.

    20) Disable hardware GRO when XDP is attached to qede, frm Manish
    Chopra.

    21) Since we encode state in the low pointer bits, dst metrics must be
    at least 4 byte aligned, which is not necessarily true on m68k. Add
    annotations to fix this, from Geert Uytterhoeven.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
    sfc: Include XDP packet headroom in buffer step size.
    sfc: fix channel allocation with brute force
    net: dst: Force 4-byte alignment of dst_metrics
    selftests: pmtu: fix init mtu value in description
    hv_netvsc: Fix unwanted rx_table reset
    net: phy: ensure that phy IDs are correctly typed
    mod_devicetable: fix PHY module format
    qede: Disable hardware gro when xdp prog is installed
    net: ena: fix issues in setting interrupt moderation params in ethtool
    net: ena: fix default tx interrupt moderation interval
    net/smc: unregister ib devices in reboot_event
    net: stmmac: platform: Fix MDIO init for platforms without PHY
    llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
    net: hisilicon: Fix a BUG trigered by wrong bytes_compl
    net: dsa: ksz: use common define for tag len
    s390/qeth: don't return -ENOTSUPP to userspace
    s390/qeth: fix promiscuous mode after reset
    s390/qeth: handle error due to unsupported transport mode
    cxgb4: fix refcount init for TC-MQPRIO offload
    tc-testing: initial tdc selftests for cls_u32
    ...

    Linus Torvalds
     

20 Dec, 2019

1 commit

  • syzbot reported following splat:

    BUG: KASAN: vmalloc-out-of-bounds in size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline]
    BUG: KASAN: vmalloc-out-of-bounds in compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155
    Read of size 4 at addr ffffc900004461f4 by task syz-executor267/7937

    CPU: 1 PID: 7937 Comm: syz-executor267 Not tainted 5.5.0-rc1-syzkaller #0
    size_entry_mwt net/bridge/netfilter/ebtables.c:2063 [inline]
    compat_copy_entries+0x128b/0x1380 net/bridge/netfilter/ebtables.c:2155
    compat_do_replace+0x344/0x720 net/bridge/netfilter/ebtables.c:2249
    compat_do_ebt_set_ctl+0x22f/0x27e net/bridge/netfilter/ebtables.c:2333
    [..]

    Because padding isn't considered during computation of ->buf_user_offset,
    "total" is decremented by fewer bytes than it should.

    Therefore, the first part of

    if (*total < sizeof(*entry) || entry->next_offset < sizeof(*entry))

    will pass, -- it should not have. This causes oob access:
    entry->next_offset is past the vmalloced size.

    Reject padding and check that computed user offset (sum of ebt_entry
    structure plus all individual matches/watchers/targets) is same
    value that userspace gave us as the offset of the next entry.

    Reported-by: syzbot+f68108fed972453a0ad4@syzkaller.appspotmail.com
    Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     

15 Dec, 2019

1 commit

  • This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk,
    transition_fwd xstats counters to the bridge ports copied over via
    netlink, providing useful information for STP.

    Signed-off-by: Vivien Didelot
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: Jakub Kicinski

    Vivien Didelot
     

10 Dec, 2019

2 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Wait for rcu grace period after releasing netns in ctnetlink,
    from Florian Westphal.

    2) Incorrect command type in flowtable offload ndo invocation,
    from wenxu.

    3) Incorrect callback type in flowtable offload flow tuple
    updates, also from wenxu.

    4) Fix compile warning on flowtable offload infrastructure due to
    possible reference to uninitialized variable, from Nathan Chancellor.

    5) Do not inline nf_ct_resolve_clash(), this is called from slow
    path / stress situations. From Florian Westphal.

    6) Missing IPv6 flow selector description in flowtable offload.

    7) Missing check for NETDEV_UNREGISTER in nf_tables offload
    infrastructure, from wenxu.

    8) Update NAT selftest to use randomized netns names, from
    Florian Westphal.

    9) Restore nfqueue bridge support, from Marco Oliverio.

    10) Compilation warning in SCTP_CHUNKMAP_*() on xt_sctp header.
    From Phil Sutter.

    11) Fix bogus lookup/get match for non-anonymous rbtree sets.

    12) Missing netlink validation for NFT_SET_ELEM_INTERVAL_END
    elements.

    13) Missing netlink validation for NFT_DATA_VALUE after
    nft_data_init().

    14) If rule specifies no actions, offload infrastructure returns
    EOPNOTSUPP.

    15) Module refcount leak in object updates.

    16) Missing sanitization for ARP traffic from br_netfilter, from
    Eric Dumazet.

    17) Compilation breakage on big-endian due to incorrect memcpy()
    size in the flowtable offload infrastructure.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
    at places where these are defined. Later patches will remove the unused
    definition of FIELD_SIZEOF().

    This patch is generated using following script:

    EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

    git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
    do

    if [[ "$file" =~ $EXCLUDE_FILES ]]; then
    continue
    fi
    sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
    done

    Signed-off-by: Pankaj Bharadiya
    Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
    Co-developed-by: Kees Cook
    Signed-off-by: Kees Cook
    Acked-by: David Miller # for net

    Pankaj Bharadiya
     

09 Dec, 2019

1 commit

  • syzbot is kind enough to remind us we need to call skb_may_pull()

    BUG: KMSAN: uninit-value in br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665
    CPU: 1 PID: 11631 Comm: syz-executor.1 Not tainted 5.4.0-rc8-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1c9/0x220 lib/dump_stack.c:118
    kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
    __msan_warning+0x64/0xc0 mm/kmsan/kmsan_instr.c:245
    br_nf_forward_arp+0xe61/0x1230 net/bridge/br_netfilter_hooks.c:665
    nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
    nf_hook_slow+0x18b/0x3f0 net/netfilter/core.c:512
    nf_hook include/linux/netfilter.h:260 [inline]
    NF_HOOK include/linux/netfilter.h:303 [inline]
    __br_forward+0x78f/0xe30 net/bridge/br_forward.c:109
    br_flood+0xef0/0xfe0 net/bridge/br_forward.c:234
    br_handle_frame_finish+0x1a77/0x1c20 net/bridge/br_input.c:162
    nf_hook_bridge_pre net/bridge/br_input.c:245 [inline]
    br_handle_frame+0xfb6/0x1eb0 net/bridge/br_input.c:348
    __netif_receive_skb_core+0x20b9/0x51a0 net/core/dev.c:4830
    __netif_receive_skb_one_core net/core/dev.c:4927 [inline]
    __netif_receive_skb net/core/dev.c:5043 [inline]
    process_backlog+0x610/0x13c0 net/core/dev.c:5874
    napi_poll net/core/dev.c:6311 [inline]
    net_rx_action+0x7a6/0x1aa0 net/core/dev.c:6379
    __do_softirq+0x4a1/0x83a kernel/softirq.c:293
    do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1091

    do_softirq kernel/softirq.c:338 [inline]
    __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:190
    local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
    rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
    __dev_queue_xmit+0x38e8/0x4200 net/core/dev.c:3819
    dev_queue_xmit+0x4b/0x60 net/core/dev.c:3825
    packet_snd net/packet/af_packet.c:2959 [inline]
    packet_sendmsg+0x8234/0x9100 net/packet/af_packet.c:2984
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg net/socket.c:657 [inline]
    __sys_sendto+0xc44/0xc70 net/socket.c:1952
    __do_sys_sendto net/socket.c:1964 [inline]
    __se_sys_sendto+0x107/0x130 net/socket.c:1960
    __x64_sys_sendto+0x6e/0x90 net/socket.c:1960
    do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x45a679
    Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f0a3c9e5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 000000000045a679
    RDX: 000000000000000e RSI: 0000000020000200 RDI: 0000000000000003
    RBP: 000000000075bf20 R08: 00000000200000c0 R09: 0000000000000014
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007f0a3c9e66d4
    R13: 00000000004c8ec1 R14: 00000000004dfe28 R15: 00000000ffffffff

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
    kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
    kmsan_slab_alloc+0x97/0x100 mm/kmsan/kmsan_hooks.c:86
    slab_alloc_node mm/slub.c:2773 [inline]
    __kmalloc_node_track_caller+0xe27/0x11a0 mm/slub.c:4381
    __kmalloc_reserve net/core/skbuff.c:141 [inline]
    __alloc_skb+0x306/0xa10 net/core/skbuff.c:209
    alloc_skb include/linux/skbuff.h:1049 [inline]
    alloc_skb_with_frags+0x18c/0xa80 net/core/skbuff.c:5662
    sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2244
    packet_alloc_skb net/packet/af_packet.c:2807 [inline]
    packet_snd net/packet/af_packet.c:2902 [inline]
    packet_sendmsg+0x63a6/0x9100 net/packet/af_packet.c:2984
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg net/socket.c:657 [inline]
    __sys_sendto+0xc44/0xc70 net/socket.c:1952
    __do_sys_sendto net/socket.c:1964 [inline]
    __se_sys_sendto+0x107/0x130 net/socket.c:1960
    __x64_sys_sendto+0x6e/0x90 net/socket.c:1960
    do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Fixes: c4e70a87d975 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Reviewed-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Eric Dumazet
     

04 Dec, 2019

1 commit

  • We have an interesting memory leak in the bridge when it is being
    unregistered and is a slave to a master device which would change the
    mac of its slaves on unregister (e.g. bond, team). This is a very
    unusual setup but we do end up leaking 1 fdb entry because
    dev_set_mac_address() would cause the bridge to insert the new mac address
    into its table after all fdbs are flushed, i.e. after dellink() on the
    bridge has finished and we call NETDEV_UNREGISTER the bond/team would
    release it and will call dev_set_mac_address() to restore its original
    address and that in turn will add an fdb in the bridge.
    One fix is to check for the bridge dev's reg_state in its
    ndo_set_mac_address callback and return an error if the bridge is not in
    NETREG_REGISTERED.

    Easy steps to reproduce:
    1. add bond in mode != A/B
    2. add any slave to the bond
    3. add bridge dev as a slave to the bond
    4. destroy the bridge device

    Trace:
    unreferenced object 0xffff888035c4d080 (size 128):
    comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
    hex dump (first 32 bytes):
    41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00 A..6............
    d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00 ...^?...........
    backtrace:
    [] kmem_cache_alloc+0x155/0x26f
    [] fdb_create+0x21/0x486 [bridge]
    [] fdb_insert+0x91/0xdc [bridge]
    [] br_fdb_change_mac_address+0xb3/0x175 [bridge]
    [] br_stp_change_bridge_id+0xf/0xff [bridge]
    [] br_set_mac_address+0x76/0x99 [bridge]
    [] dev_set_mac_address+0x63/0x9b
    [] __bond_release_one+0x3f6/0x455 [bonding]
    [] bond_netdev_event+0x2f2/0x400 [bonding]
    [] notifier_call_chain+0x38/0x56
    [] call_netdevice_notifiers+0x1e/0x23
    [] rollback_registered_many+0x353/0x6a4
    [] unregister_netdevice_many+0x17/0x6f
    [] rtnl_delete_link+0x3c/0x43
    [] rtnl_dellink+0x1dc/0x20a
    [] rtnetlink_rcv_msg+0x23d/0x268

    Fixes: 43598813386f ("bridge: add local MAC address to forwarding table (v2)")
    Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

13 Nov, 2019

1 commit


10 Nov, 2019

1 commit


05 Nov, 2019

2 commits

  • xt_in() returns NULL in the output hook, skip the pkt_type change for
    that case, redirection only makes sense in broute/prerouting hooks.

    Reported-by: Tom Yan
    Cc: Linus Lüssing
    Fixes: cf3cb246e277d ("bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • When commit df1c0b8468b3 ("[BRIDGE]: Packets leaking out of
    disabled/blocked ports.") introduced the port state tests in
    br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was
    also used to avoid learning/refreshing from user-space with NTF_USE. Those
    two tests are done for every packet entering the bridge if it's learning,
    but for the fast-path we already have them checked in br_handle_frame() and
    is unnecessary to do it again. Thus push the checks to the unlikely cases
    and drop them from br_fdb_update(), the new nbp_state_should_learn() helper
    is used to determine if the port state allows br_fdb_update() to be called.
    The two places which need to do it manually are:
    - user-space add call with NTF_USE set
    - link-local packet learning done in __br_handle_local_finish()

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

03 Nov, 2019

1 commit


02 Nov, 2019

1 commit