25 Jul, 2020

1 commit


22 May, 2020

1 commit

  • In case we can't find a ->dumpit callback for the requested
    (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're
    in the same situation as if userspace had requested a PF_UNSPEC
    dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all
    the registered RTM_GETROUTE handlers.

    The requested table id may or may not exist for all of those
    families. commit ae677bbb4441 ("net: Don't return invalid table id
    error when dumping all families") fixed the problem when userspace
    explicitly requests a PF_UNSPEC dump, but missed the fallback case.

    For example, when we pass ipv6.disable=1 to a kernel with
    CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y,
    the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in
    rtnl_dump_all, and listing IPv6 routes will unexpectedly print:

    # ip -6 r
    Error: ipv4: MR table does not exist.
    Dump terminated

    commit ae677bbb4441 introduced the dump_all_families variable, which
    gets set when userspace requests a PF_UNSPEC dump. However, we can't
    simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the
    fallback case to get dump_all_families == true, because some messages
    types (for example RTM_GETRULE and RTM_GETNEIGH) only register the
    PF_UNSPEC handler and use the family to filter in the kernel what is
    dumped to userspace. We would then export more entries, that userspace
    would have to filter. iproute does that, but other programs may not.

    Instead, this patch removes dump_all_families and updates the
    RTM_GETROUTE handlers to check if the family that is being dumped is
    their own. When it's not, which covers both the intentional PF_UNSPEC
    dumps (as dump_all_families did) and the fallback case, ignore the
    missing table id error.

    Fixes: cb167893f41e ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
    Signed-off-by: Sabrina Dubroca
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

17 May, 2020

1 commit

  • This patch fixes the following warning:

    =============================
    WARNING: suspicious RCU usage
    5.7.0-rc4-next-20200507-syzkaller #0 Not tainted
    -----------------------------
    net/ipv6/ip6mr.c:124 RCU-list traversed in non-reader section!!

    ipmr_new_table() returns an existing table, but there is no table at
    init. Therefore the condition: either holding rtnl or the list is empty
    is used.

    Fixes: d1db275dd3f6e ("ipv6: ip6mr: support multiple tables")
    Reported-by: kernel test robot
    Suggested-by: Jakub Kicinski
    Signed-off-by: Madhuparna Bhowmik
    Signed-off-by: David S. Miller

    Madhuparna Bhowmik
     

13 Mar, 2020

1 commit

  • Convert the various uses of fallthrough comments to fallthrough;

    Done via script
    Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

    And by hand:

    net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block
    that causes gcc to emit a warning if converted in-place.

    So move the new fallthrough; inside the containing #ifdef/#endif too.

    Signed-off-by: Joe Perches
    Signed-off-by: David S. Miller

    Joe Perches
     

25 Feb, 2020

1 commit

  • ip6mr_for_each_table() macro uses list_for_each_entry_rcu()
    for traversing outside an RCU read side critical section
    but under the protection of rtnl_mutex. Hence add the
    corresponding lockdep expression to silence the following
    false-positive warnings:

    [ 4.319479] =============================
    [ 4.319480] WARNING: suspicious RCU usage
    [ 4.319482] 5.5.4-stable #17 Tainted: G E
    [ 4.319483] -----------------------------
    [ 4.319485] net/ipv6/ip6mr.c:1243 RCU-list traversed in non-reader section!!

    [ 4.456831] =============================
    [ 4.456832] WARNING: suspicious RCU usage
    [ 4.456834] 5.5.4-stable #17 Tainted: G E
    [ 4.456835] -----------------------------
    [ 4.456837] net/ipv6/ip6mr.c:1582 RCU-list traversed in non-reader section!!

    Signed-off-by: Amol Grover
    Signed-off-by: David S. Miller

    Amol Grover
     

05 Oct, 2019

1 commit


07 Sep, 2019

1 commit

  • This is a re-post of previous patch wrote by David Miller[1].

    Phil Karn reported[2] that on busy networks with lots of unresolved
    multicast routing entries, the creation of new multicast group routes
    can be extremely slow and unreliable.

    The reason is we hard-coded multicast route entries with unresolved source
    addresses(cache_resolve_queue_len) to 10. If some multicast route never
    resolves and the unresolved source addresses increased, there will
    be no ability to create new multicast route cache.

    To resolve this issue, we need either add a sysctl entry to make the
    cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len
    limit directly, as we already have the socket receive queue limits of mrouted
    socket, pointed by David.

    >From my side, I'd perfer to remove the cache_resolve_queue_len limit instead
    of creating two more(IPv4 and IPv6 version) sysctl entry.

    [1] https://lkml.org/lkml/2018/7/22/11
    [2] https://lkml.org/lkml/2018/7/21/343

    v3: instead of remove cache_resolve_queue_len totally, let's only remove
    the hard code limit when allocate the unresolved cache, as Eric Dumazet
    suggested, so we don't need to re-count it in other places.

    v2: hold the mfc_unres_lock while walking the unresolved list in
    queue_count(), as Nikolay Aleksandrov remind.

    Reported-by: Phil Karn
    Signed-off-by: Hangbin Liu
    Reviewed-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Hangbin Liu
     

31 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

08 Apr, 2019

1 commit

  • This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
    bucket pointer to lock the hash chain for that bucket.

    The benefits of a bit spin_lock are:
    - no need to allocate a separate array of locks.
    - no need to have a configuration option to guide the
    choice of the size of this array
    - locking cost is often a single test-and-set in a cache line
    that will have to be loaded anyway. When inserting at, or removing
    from, the head of the chain, the unlock is free - writing the new
    address in the bucket head implicitly clears the lock bit.
    For __rhashtable_insert_fast() we ensure this always happens
    when adding a new key.
    - even when lockings costs 2 updates (lock and unlock), they are
    in a cacheline that needs to be read anyway.

    The cost of using a bit spin_lock is a little bit of code complexity,
    which I think is quite manageable.

    Bit spin_locks are sometimes inappropriate because they are not fair -
    if multiple CPUs repeatedly contend of the same lock, one CPU can
    easily be starved. This is not a credible situation with rhashtable.
    Multiple CPUs may want to repeatedly add or remove objects, but they
    will typically do so at different buckets, so they will attempt to
    acquire different locks.

    As we have more bit-locks than we previously had spinlocks (by at
    least a factor of two) we can expect slightly less contention to
    go with the slightly better cache behavior and reduced memory
    consumption.

    To enhance type checking, a new struct is introduced to represent the
    pointer plus lock-bit
    that is stored in the bucket-table. This is "struct rhash_lock_head"
    and is empty. A pointer to this needs to be cast to either an
    unsigned lock, or a "struct rhash_head *" to be useful.
    Variables of this type are most often called "bkt".

    Previously "pprev" would sometimes point to a bucket, and sometimes a
    ->next pointer in an rhash_head. As these are now different types,
    pprev is NULL when it would have pointed to the bucket. In that case,
    'blk' is used, together with correct locking protocol.

    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

05 Mar, 2019

1 commit

  • Similar to commit 44f49dd8b5a6 ("ipmr: fix possible race resulting from
    improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot
    assume preemption is disabled when incrementing the counter and
    accessing a per-CPU variable.

    Preemption can be enabled when we add a route in process context that
    corresponds to packets stored in the unresolved queue, which are then
    forwarded using this route [1].

    Fix this by using IP6_INC_STATS() which takes care of disabling
    preemption on architectures where it is needed.

    [1]
    [ 157.451447] BUG: using __this_cpu_add() in preemptible [00000000] code: smcrouted/2314
    [ 157.460409] caller is ip6mr_forward2+0x73e/0x10e0
    [ 157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted 5.0.0-rc7-custom-03635-g22f2712113f1 #1336
    [ 157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
    [ 157.460461] Call Trace:
    [ 157.460486] dump_stack+0xf9/0x1be
    [ 157.460553] check_preemption_disabled+0x1d6/0x200
    [ 157.460576] ip6mr_forward2+0x73e/0x10e0
    [ 157.460705] ip6_mr_forward+0x9a0/0x1510
    [ 157.460771] ip6mr_mfc_add+0x16b3/0x1e00
    [ 157.461155] ip6_mroute_setsockopt+0x3cb/0x13c0
    [ 157.461384] do_ipv6_setsockopt.isra.8+0x348/0x4060
    [ 157.462013] ipv6_setsockopt+0x90/0x110
    [ 157.462036] rawv6_setsockopt+0x4a/0x120
    [ 157.462058] __sys_setsockopt+0x16b/0x340
    [ 157.462198] __x64_sys_setsockopt+0xbf/0x160
    [ 157.462220] do_syscall_64+0x14d/0x610
    [ 157.462349] entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: 0912ea38de61 ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().")
    Signed-off-by: Ido Schimmel
    Reported-by: Amit Cohen
    Signed-off-by: David S. Miller

    Ido Schimmel
     

22 Feb, 2019

1 commit

  • Currently the only way to clear the forwarding cache was to delete the
    entries one by one using the MRT_DEL_MFC socket option or to destroy and
    recreate the socket.

    Create a new socket option which with the use of optional flags can
    clear any combination of multicast entries (static or not static) and
    multicast vifs (static or not static).

    Calling the new socket option MRT_FLUSH with the flags MRT_FLUSH_MFC and
    MRT_FLUSH_VIFS will clear all entries and vifs on the socket except for
    static entries.

    Signed-off-by: Callum Sinclair
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Callum Sinclair
     

28 Jan, 2019

1 commit

  • When the MC route socket is closed, mroute_clean_tables() is called to
    cleanup existing routes. Mistakenly notifiers call was put on the cleanup
    of the unresolved MC route entries cache.
    In a case where the MC socket closes before an unresolved route expires,
    the notifier call leads to a crash, caused by the driver trying to
    increment a non initialized refcount_t object [1] and then when handling
    is done, to decrement it [2]. This was detected by a test recently added in
    commit 6d4efada3b82 ("selftests: forwarding: Add multicast routing test").

    Fix that by putting notifiers call on the resolved entries traversal,
    instead of on the unresolved entries traversal.

    [1]

    [ 245.748967] refcount_t: increment on 0; use-after-free.
    [ 245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
    ...
    [ 245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
    [ 245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
    ...
    [ 245.907487] Call Trace:
    [ 245.910231] mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
    [ 245.917913] notifier_call_chain+0x45/0x7
    [ 245.922484] atomic_notifier_call_chain+0x15/0x20
    [ 245.927729] call_fib_notifiers+0x15/0x30
    [ 245.932205] mroute_clean_tables+0x372/0x3f
    [ 245.936971] ip6mr_sk_done+0xb1/0xc0
    [ 245.940960] ip6_mroute_setsockopt+0x1da/0x5f0
    ...

    [2]

    [ 246.128487] refcount_t: underflow; use-after-free.
    [ 246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
    [ 246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
    ...
    [ 246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
    [ 246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
    ...
    [ 246.298889] Call Trace:
    [ 246.301617] refcount_dec_and_test_checked+0x11/0x20
    [ 246.307170] mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
    [ 246.315531] process_one_work+0x1fa/0x3f0
    [ 246.320005] worker_thread+0x2f/0x3e0
    [ 246.324083] kthread+0x118/0x130
    [ 246.327683] ? wq_update_unbound_numa+0x1b0/0x1b0
    [ 246.332926] ? kthread_park+0x80/0x80
    [ 246.337013] ret_from_fork+0x1f/0x30

    Fixes: 088aa3eec2ce ("ip6mr: Support fib notifications")
    Signed-off-by: Nir Dotan
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Nir Dotan
     

02 Jan, 2019

1 commit

  • KMSAN detected read beyond end of buffer in vti and sit devices when
    passing truncated packets with PF_PACKET. The issue affects additional
    ip tunnel devices.

    Extend commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the
    inner header") and commit ccfec9e5cb2d ("ip_tunnel: be careful when
    accessing the inner header").

    Move the check to a separate helper and call at the start of each
    ndo_start_xmit function in net/ipv4 and net/ipv6.

    Minor changes:
    - convert dev_kfree_skb to kfree_skb on error path,
    as dev_kfree_skb calls consume_skb which is not for error paths.
    - use pskb_network_may_pull even though that is pedantic here,
    as the same as pskb_may_pull for devices without llheaders.
    - do not cache ipv6 hdrs if used only once
    (unsafe across pskb_may_pull, was more relevant to earlier patch)

    Reported-by: syzbot
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

21 Dec, 2018

1 commit


18 Dec, 2018

1 commit


15 Dec, 2018

1 commit

  • vr.mifi is indirectly controlled by user-space, hence leading to
    a potential exploitation of the Spectre variant 1 vulnerability.

    This issue was detected with the help of Smatch:

    net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
    net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)

    Fix this by sanitizing vr.mifi before using it to index mrt->vif_table'

    Notice that given that speculation windows are large, the policy is
    to kill the speculation on the first load and not worry if it can be
    completed with a dependent load/store [1].

    [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: David S. Miller

    Gustavo A. R. Silva
     

07 Dec, 2018

1 commit

  • In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_open().

    Therefore extend dev_open() with and extra extack argument and update
    all users. Most of the calls end up just encoding NULL, but bond and
    team drivers have the extack readily available.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     

25 Oct, 2018

1 commit

  • When doing a route dump across all address families, do not error out
    if the table does not exist. This allows a route dump for AF_UNSPEC
    with a table id that may only exist for some of the families.

    Do return the table does not exist error if dumping routes for a
    specific family and the table does not exist.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

16 Oct, 2018

3 commits

  • Update parsing of route dump request to enable kernel side filtering.
    Allow filtering results by protocol (e.g., which routing daemon installed
    the route), route type (e.g., unicast), table id and nexthop device. These
    amount to the low hanging fruit, yet a huge improvement, for dumping
    routes.

    ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can
    be used to look up the device index without taking a reference. From
    there filter->dev is only used during dump loops with the lock still held.

    Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results
    have been filtered should no entries be returned.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Implement kernel side filtering of routes by egress device index and
    table id. If the table id is given in the filter, lookup table and
    call mr_table_dump directly for it.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add struct fib_dump_filter for options on limiting which routes are
    returned in a dump request. The current list is table id, protocol,
    route type, rtm_flags and nexthop device index. struct net is needed
    to lookup the net_device from the index.

    Declare the filter for each route dump handler and plumb the new
    arguments from dump handlers to ip_valid_fib_dump_req.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

09 Oct, 2018

1 commit

  • Add helper to check netlink message for route dumps. If the strict flag
    is set the dump request is expected to have an rtmsg struct as the header.
    All elements of the struct are expected to be 0 with the exception of
    rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
    can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
    set.

    Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
    and ip6mr_rtm_dumproute to call this helper if strict data checking is
    enabled.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

03 Oct, 2018

1 commit

  • The code to obtain the correct table for the incoming interface was
    missing for IPv6. This has been added along with the table creation
    notification to fib rules for the RTNL_FAMILY_IP6MR address family.

    Signed-off-by: Patrick Ruddy
    Signed-off-by: Mike Manning
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Patrick Ruddy
     

22 Jun, 2018

1 commit

  • Due to the use of rhashtables in net namespaces,
    rhashtable.h is included in lots of the kernel,
    so a small changes can required a large recompilation.
    This makes development painful.

    This patch splits out rhashtable-types.h which just includes
    the major type declarations, and does not include (non-trivial)
    inline code. rhashtable.h is no longer included by anything
    in the include/ directory.
    Common include files only include rhashtable-types.h so a large
    recompilation is only triggered when that changes.

    Acked-by: Herbert Xu
    Signed-off-by: NeilBrown
    Signed-off-by: David S. Miller

    NeilBrown
     

07 Jun, 2018

1 commit

  • Pull networking updates from David Miller:

    1) Add Maglev hashing scheduler to IPVS, from Inju Song.

    2) Lots of new TC subsystem tests from Roman Mashak.

    3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

    4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

    5) Add ttl inherit support to vxlan, from Hangbin Liu.

    6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

    7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

    8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

    9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

    10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

    11) Plumb extack down into fib_rules, from Roopa Prabhu.

    12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

    13) Add UDP GSO support, from Willem de Bruijn.

    14) Add documentation for eBPF helpers, from Quentin Monnet.

    15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

    16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

    17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

    18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

    19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

    20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

    21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

    22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

    23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

    24) Add TCP SACK compression, from Eric Dumazet.

    25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

    26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

    27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

    28) Add lots of forwarding selftests, from Petr Machata.

    29) Add generic network device failover driver, from Sridhar Samudrala.

    * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
    strparser: Add __strp_unpause and use it in ktls.
    rxrpc: Fix terminal retransmission connection ID to include the channel
    net: hns3: Optimize PF CMDQ interrupt switching process
    net: hns3: Fix for VF mailbox receiving unknown message
    net: hns3: Fix for VF mailbox cannot receiving PF response
    bnx2x: use the right constant
    Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
    net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
    enic: fix UDP rss bits
    netdev-FAQ: clarify DaveM's position for stable backports
    rtnetlink: validate attributes in do_setlink()
    mlxsw: Add extack messages for port_{un, }split failures
    netdevsim: Add extack error message for devlink reload
    devlink: Add extack to reload and port_{un, }split operations
    net: metrics: add proper netlink validation
    ipmr: fix error path when ipmr_new_table fails
    ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
    net: hns3: remove unused hclgevf_cfg_func_mta_filter
    netfilter: provide udp*_lib_lookup for nf_tproxy
    qed*: Utilize FW 8.37.2.0
    ...

    Linus Torvalds
     

06 Jun, 2018

2 commits

  • commit 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
    refactored ipmr_new_table, so that it now returns NULL when
    mr_table_alloc fails. Unfortunately, all callers of ipmr_new_table
    expect an ERR_PTR.

    This can result in NULL deref, for example when ipmr_rules_exit calls
    ipmr_free_table with NULL net->ipv4.mrt in the
    !CONFIG_IP_MROUTE_MULTIPLE_TABLES version.

    This patch makes mr_table_alloc return errors, and changes
    ip6mr_new_table and its callers to return/expect error pointers as
    well. It also removes the version of mr_table_alloc defined under
    !CONFIG_IP_MROUTE_COMMON, since it is never used.

    Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     
  • Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during
    ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same
    setsockopt will fail with -ENOENT, since we haven't actually created
    that table.

    A similar fix for ipv4 was included in commit 5e1859fbcc3c ("ipv4: ipmr:
    various fixes and cleanups").

    Fixes: d1db275dd3f6 ("ipv6: ip6mr: support multiple tables")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

16 May, 2018

1 commit

  • Variants of proc_create{,_data} that directly take a struct seq_operations
    and deal with network namespaces in ->open and ->release. All callers of
    proc_create + seq_open_net converted over, and seq_{open,release}_net are
    removed entirely.

    Signed-off-by: Christoph Hellwig

    Christoph Hellwig
     

23 Apr, 2018

1 commit


28 Mar, 2018

1 commit


27 Mar, 2018

3 commits

  • Since ipmr and ip6mr are using the same mr_mfc struct at their core, we
    can now refactor the ipmr_cache_{hold,put} logic and apply refcounting
    to both ipmr and ip6mr.

    Signed-off-by: Yuval Mintz
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • Add the ability to discern whether a given FIB rule notification relates
    to the default rule inserted when registering ip6mr or a different one.

    Would later be used by drivers wishing to offload ipv6 multicast routes
    but unable to offload rules other than the default one.

    Signed-off-by: Yuval Mintz
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • In similar fashion to ipmr, support fib notifications for ip6mr mfc and
    vif related events. This would later allow drivers to react to said
    notifications and offload the IPv6 mroutes.

    Signed-off-by: Yuval Mintz
    Signed-off-by: Ido Schimmel
    Signed-off-by: David S. Miller

    Yuval Mintz
     

08 Mar, 2018

1 commit

  • Kirill found that recently added synchronize_rcu() call in
    ip6mr_sk_done()
    was slowing down netns dismantle and posted a patch to use it only if
    the socket
    was found.

    I instead suggested to get rid of this call, and use instead
    SOCK_RCU_FREE

    We might later change IPv4 side to use the same technique and unify
    both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
    that could be replaced by SOCK_RCU_FREE.

    Tested:
    time for i in {1..1000}; do unshare -n /bin/false;done

    Before : real 7m18.911s
    After : real 10.187s

    Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based")
    Signed-off-by: Eric Dumazet
    Reported-by: Kirill Tkhai
    Cc: Yuval Mintz
    Reviewed-by: Kirill Tkhai
    Signed-off-by: David S. Miller

    Eric Dumazet
     

02 Mar, 2018

6 commits

  • The various MFC entries are being held in the same kind of mr_tables
    for both ipmr and ip6mr, and their traversal logic is identical.
    Also, with the exception of the addresses [and other small tidbits]
    the major bulk of the nla setting is identical.

    Unite as much of the dumping as possible between the two.
    Notice this requires creating an mr_table iterator for each, as the
    for-each preprocessor macro can't be used by the common logic.

    Signed-off-by: Yuval Mintz
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • MFC_NOTIFY exists in ip6mr, probably as some legacy code
    [was already removed for ipmr in commit
    06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum").
    Remove it from ip6mr as well, and move the enum into a common file;
    Notice MFC_OFFLOAD is currently only used by ipmr.

    Signed-off-by: Yuval Mintz
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • Same as previously done with the mfc seq, the logic for the vif seq is
    refactored to be shared between ipmr and ip6mr.

    Signed-off-by: Yuval Mintz
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • With the exception of the final dump, ipmr and ip6mr have the exact same
    seq logic for traversing a given mr_table. Refactor that code and make
    it common.

    Signed-off-by: Yuval Mintz
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • ipmr and ip6mr utilize the exact same methods for searching the
    hashed resolved connections, difference being only in the construction
    of the hash comparison key.

    In order to unite the flow, introduce an mr_table operation set that
    would contain the protocol specific information required for common
    flows, in this case - the hash parameters and a comparison key
    representing a (*,*) route.

    Signed-off-by: Yuval Mintz
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Yuval Mintz
     
  • mfc_cache and mfc6_cache are almost identical - the main difference is
    in the origin/group addresses and comparison-key. Make a common
    structure encapsulating most of the multicast routing logic - mr_mfc
    and convert both ipmr and ip6mr into using it.

    For easy conversion [casting, in this case] mr_mfc has to be the first
    field inside every multicast routing abstraction utilizing it.

    Signed-off-by: Yuval Mintz
    Acked-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Yuval Mintz