24 Sep, 2020

11 commits

  • When excluding S,G entries we need a way to block a particular S,G,port.
    The new port group flag is managed based on the source's timer as per
    RFCs 3376 and 3810. When a source expires and its port group is in
    EXCLUDE mode, it will be blocked.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We need to handle group filter mode transitions and initial state.
    To change a port group's INCLUDE -> EXCLUDE mode (or when we have added
    a new port group in EXCLUDE mode) we need to add that port to all of
    *,G ports' S,G entries for proper replication. When the EXCLUDE state is
    changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be
    called after the source list processing because the assumption is that
    all of the group's S,G entries will be created before transitioning to
    EXCLUDE mode, i.e. most importantly its blocked entries will already be
    added so it will not get automatically added to them.
    The transition EXCLUDE -> INCLUDE happens only when a port group timer
    expires, it requires us to remove that port from all of *,G ports' S,G
    entries where it was automatically added previously.
    Finally when we are adding a new S,G entry we must add all of *,G's
    EXCLUDE ports to it.
    In order to distinguish automatically added *,G EXCLUDE ports we have a
    new port group flag - MDB_PG_FLAGS_STAR_EXCL.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • To speedup S,G forward handling we need to be able to quickly find out
    if a port is a member of an S,G group. To do that add a global S,G port
    rhashtable with key: source addr, group addr, protocol, vid (all br_ip
    fields) and port pointer.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We need to be able to differentiate between pg entries created by
    user-space and the kernel when we start generating S,G entries for
    IGMPv3/MLDv2's fast path. User-space entries are created by default as
    RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can
    allow user-space to provide the entry rt_protocol so we can
    differentiate between who added the entries specifically (e.g. clag,
    admin, frr etc).

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add new mdb attributes (MDBE_ATTR_SOURCE for setting,
    MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb
    entries with a source address (S,G). New S,G entries are created with
    filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and
    IPv6, they're validated and parsed based on their protocol.
    S,G host joined entries which are added by user are not allowed yet.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Since the MDB add/del code expects an exact struct br_mdb_entry we can't
    really add any extensions, thus add a new nested attribute at the level of
    MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass
    all new options via netlink attributes. This patch doesn't change
    anything functionally since the new attribute is not used yet, only
    parsed.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Since now we have src in br_ip, u no longer makes sense so rename
    it to dst. No functional changes.

    v2: fix build with CONFIG_BATMAN_ADV_MCAST

    CC: Marek Lindner
    CC: Simon Wunderlich
    CC: Antonio Quartulli
    CC: Sven Eckelmann
    CC: b.a.t.m.a.n@lists.open-mesh.org
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Now that we have src and dst in br_ip it is logical to use the src field
    for the cases where we need to work with a source address such as
    querier source address and group source address.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Pass and use extack all the way down to br_mdb_add_group().

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • To avoid doing duplicate device checks and searches (the same were done
    in br_mdb_add and __br_mdb_add) pass the already found port to __br_mdb_add
    and pull the bridge's netif_running and enabled multicast checks to
    br_mdb_add. This would also simplify the future extack errors.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We can drop the pr_info() calls and just use extack to return a
    meaningful error to user-space when br_mdb_parse() fails.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

12 Sep, 2020

1 commit

  • Each MDB entry is encoded in a nested netlink attribute called
    'MDBA_MDB_ENTRY'. In turn, this attribute contains another nested
    attributed called 'MDBA_MDB_ENTRY_INFO', which encodes a single port
    group entry within the MDB entry.

    The cited commit added the ability to restart a dump from a specific
    port group entry. However, on failure to add a port group entry to the
    dump the entire MDB entry (stored in 'nest2') is removed, resulting in
    missing port group entries.

    Fix this by finalizing the MDB entry with the partial list of already
    encoded port group entries.

    Fixes: 5205e919c9f0 ("net: bridge: mcast: add support for src list and filter mode dumping")
    Signed-off-by: Ido Schimmel
    Acked-by: Nikolay Aleksandrov
    Reviewed-by: Jiri Pirko
    Signed-off-by: David S. Miller

    Ido Schimmel
     

08 Sep, 2020

5 commits

  • We have to use mdb and port entries when sending mdb notifications in
    order to fill in all group attributes properly. Before this change we
    would've used a fake br_mdb_entry struct to fill in only partial
    information about the mdb. Now we can also reuse the mdb dump fill
    function and thus have only a single central place which fills the mdb
    attributes.

    v3: add IPv6 support

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Jakub Kicinski

    Nikolay Aleksandrov
     
  • This change is in preparation for using the mdb port group entries when
    sending a notification, so their full state and additional attributes can
    be filled in.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Jakub Kicinski

    Nikolay Aleksandrov
     
  • Support per port group src list (address and timer) and filter mode
    dumping. Protected by either multicast_lock or rcu.

    v3: add IPv6 support
    v2: require RCU or multicast_lock to traverse src groups

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Jakub Kicinski

    Nikolay Aleksandrov
     
  • Initial functions for group source lists which are needed for IGMPv3
    and MLDv2 include/exclude lists. Both IPv4 and IPv6 sources are supported.
    User-added mdb entries are created with exclude filter mode, we can
    extend that later to allow user-supplied mode. When group src entries
    are deleted, they're freed from a workqueue to make sure their timers
    are not still running. Source entries are protected by the multicast_lock
    and rcu. The number of src groups per port group is limited to 32.

    v4: use the new port group del function directly
    add igmpv2/mldv1 bool to denote if the entry was added in those
    modes, it will later replace the old update_timer bool
    v3: add IPv6 support
    v2: allow src groups to be traversed under rcu

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Jakub Kicinski

    Nikolay Aleksandrov
     
  • In order to avoid future errors and reduce code duplication we should
    factor out the port group del sequence. This allows us to have one
    function which takes care of all details when removing a port group.

    v4: set pg's fast leave flag when deleting due to fast leave
    move the patch before adding source lists

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: Jakub Kicinski

    Nikolay Aleksandrov
     

15 Sep, 2019

1 commit


10 Sep, 2019

1 commit

  • NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end.
    In fact, NLMSG_DONE is sent only at the end of a dump.

    Libraries like libnl will wait forever for NLMSG_DONE.

    Fixes: 949f1e39a617 ("bridge: mdb: notify on router port add and del")
    CC: Nikolay Aleksandrov
    Signed-off-by: Nicolas Dichtel
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nicolas Dichtel
     

18 Aug, 2019

4 commits

  • Currently this is needed only for user-space compatibility, so similar
    object adds/deletes as the dumped ones would succeed. Later it can be
    used for L2 mcast MAC add/delete.

    v3: fix compiler warning (DaveM)
    v2: don't send a notification when used from user-space, arm the group
    timer if no ports are left after host entry del

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Currently we dump only the port mdb entries but we can have host-joined
    entries on the bridge itself and they should be treated as normal temp
    mdbs, they're already notified:
    $ bridge monitor all
    [MDB]dev br0 port br0 grp ff02::8 temp

    The group will not be shown in the bridge mdb output, but it takes 1 slot
    and it's timing out. If it's only host-joined then the mdb show output
    can even be empty.

    After this patch we show the host-joined groups:
    $ bridge mdb show
    dev br0 port br0 grp ff02::8 temp

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • We have to factor out the mdb fill portion in order to re-use it later for
    the bridge mdb entries. No functional changes intended.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Trivial patch to move the vlan comments in their proper places above the
    vid 0 checks.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

01 Aug, 2019

1 commit

  • In user-space there's no way to distinguish why an mdb entry was deleted
    and that is a problem for daemons which would like to keep the mdb in
    sync with remote ends (e.g. mlag) but would also like to converge faster.
    In almost all cases we'd like to age-out the remote entry for performance
    and convergence reasons except when fast-leave is enabled. In that case we
    want explicit immediate remote delete, thus add mdb flag which is set only
    when the entry is being deleted due to fast-leave.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

13 Dec, 2018

1 commit

  • After the previous patch, bridge driver has extack argument available to
    pass to switchdev. Therefore extend switchdev_port_obj_add() with this
    argument, updating all callers, and passing the argument through to
    switchdev_port_obj_notify().

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Acked-by: Ivan Vecera
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Petr Machata
     

06 Dec, 2018

2 commits

  • The bridge multicast code has been using a mix of RCU and RCU-bh flavors
    sometimes in questionable way. Since we've moved to rhashtable just use
    non-bh RCU everywhere. In addition this simplifies freeing of objects
    and allows us to remove some unnecessary callback functions.

    v3: new patch

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • The bridge multicast code currently uses a custom resizable hashtable
    which predates the generic rhashtable interface. It has many
    shortcomings compared and duplicates functionality that is presently
    available via the generic rhashtable, so this patch removes the custom
    rhashtable implementation in favor of the kernel's generic rhashtable.
    The hash maximum is kept and the rhashtable's size is used to do a loose
    check if it's reached in which case we revert to the old behaviour and
    disable further bridge multicast processing. Also now we can support any
    hash maximum, doesn't need to be a power of 2.

    v3: add non-rcu br_mdb_get variant and use it where multicast_lock is
    held to avoid RCU splat, drop hash_max function and just set it
    directly

    v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit
    placeholders

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

09 Oct, 2018

1 commit

  • Update br_mdb_dump for strict data checking. If the flag is set,
    the dump request is expected to have a br_port_msg struct as the
    header. All elements of the struct are expected to be 0 and no
    attributes can be appended.

    Signed-off-by: David Ahern
    Acked-by: Christian Brauner
    Signed-off-by: David S. Miller

    David Ahern
     

27 Sep, 2018

1 commit

  • Convert mcast disabled to an option bit and while doing so convert the
    logic to check if multicast is enabled instead. That is make the logic
    follow the option value - if it's set then mcast is enabled and vice versa.
    This avoids a few confusing places where we inverted the value that's being
    set to follow the mcast_disabled logic.

    Signed-off-by: Nikolay Aleksandrov
    Reviewed-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

05 Dec, 2017

1 commit


10 Nov, 2017

3 commits

  • When the host joins or leaves a multicast group, use switchdev to add
    an object to the hardware to forward traffic for the group to the
    host.

    Signed-off-by: Andrew Lunn
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • The host can join or leave a multicast group on the brX interface, as
    indicated by IGMP snooping. This is tracked within the bridge
    multicast code. Send a notification when this happens, in the same way
    a notification is sent when a port of the bridge joins/leaves a group
    because of IGMP snooping.

    Signed-off-by: Andrew Lunn
    Acked-by: Nikolay Aleksandrov
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     
  • The boolean mglist indicates the host has joined a particular
    multicast group on the bridge interface. It is badly named, obscuring
    what is means. Rename it.

    Signed-off-by: Andrew Lunn
    Acked-by: Nikolay Aleksandrov
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Andrew Lunn
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

10 Aug, 2017

1 commit

  • This change allows us to later indicate to rtnetlink core that certain
    doit functions should be called without acquiring rtnl_mutex.

    This change should have no effect, we simply replace the last (now
    unused) calcit argument with the new flag.

    Signed-off-by: Florian Westphal
    Reviewed-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Florian Westphal
     

12 Jul, 2017

1 commit

  • We currently get the following kmemleak report:
    unreferenced object 0xffff8800039d9820 (size 32):
    comm "softirq", pid 0, jiffies 4295212383 (age 792.416s)
    hex dump (first 32 bytes):
    00 0c e0 03 00 88 ff ff ff 02 00 00 00 00 00 00 ................
    00 00 00 01 ff 11 00 02 86 dd 00 00 ff ff ff ff ................
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] kmem_cache_alloc_trace+0xb8/0x1c0
    [] __br_mdb_notify+0x2a3/0x300 [bridge]
    [] br_mdb_notify+0x6e/0x70 [bridge]
    [] br_multicast_add_group+0x109/0x150 [bridge]
    [] br_ip6_multicast_add_group+0x58/0x60 [bridge]
    [] br_multicast_rcv+0x1d5/0xdb0 [bridge]
    [] br_handle_frame_finish+0xcf/0x510 [bridge]
    [] br_nf_hook_thresh.part.27+0xb/0x10 [br_netfilter]
    [] br_nf_hook_thresh+0x48/0xb0 [br_netfilter]
    [] br_nf_pre_routing_finish_ipv6+0x109/0x1d0 [br_netfilter]
    [] br_nf_pre_routing_ipv6+0xd0/0x14c [br_netfilter]
    [] br_nf_pre_routing+0x197/0x3d0 [br_netfilter]
    [] nf_iterate+0x52/0x60
    [] nf_hook_slow+0x5c/0xb0
    [] br_handle_frame+0x1a4/0x2c0 [bridge]

    This happens when switchdev_port_obj_add() fails. This patch
    frees complete_info object in the fail path.

    Reviewed-by: Vallish Vaidyeshwara
    Signed-off-by: Eduardo Valentin
    Signed-off-by: David S. Miller

    Eduardo Valentin
     

27 May, 2017

1 commit

  • It's useful for drivers supporting bridge offload to be able to query
    the bridge's VLAN filtering state.

    Currently, upon enslavement to a bridge master, the offloading driver
    will only learn about the bridge's VLAN filtering state after the bridge
    device was already linked with its slave.

    Being able to query the bridge's VLAN filtering state allows such
    drivers to forbid enslavement in case resource couldn't be allocated for
    a VLAN-aware bridge and also choose the correct initialization routine
    for the enslaved port, which is dependent on the bridge type.

    Signed-off-by: Ido Schimmel
    Signed-off-by: Jiri Pirko
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Ido Schimmel
     

18 Apr, 2017

1 commit

  • Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
    for doit functions that call it directly.

    This is the first step to using extended error reporting in rtnetlink.
    >From here individual subsystems can be updated to set netlink_ext_ack as
    needed.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern