11 Feb, 2020

1 commit

  • [ Upstream commit db3fa271022dacb9f741b96ea4714461a8911bb9 ]

    __in6_dev_get(dev) called from inet6_set_link_af() can return NULL.

    The needed check has been recently removed, let's add it back.

    While do_setlink() does call validate_linkmsg() :
    ...
    err = validate_linkmsg(dev, tb); /* OK at this point */
    ...

    It is possible that the following call happening before the
    ->set_link_af() removes IPv6 if MTU is less than 1280 :

    if (tb[IFLA_MTU]) {
    err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack);
    if (err < 0)
    goto errout;
    status |= DO_SETLINK_MODIFIED;
    }
    ...

    if (tb[IFLA_AF_SPEC]) {
    ...
    err = af_ops->set_link_af(dev, af);
    ->inet6_set_link_af() // CRASH because idev is NULL

    Please note that IPv4 is immune to the bug since inet_set_link_af() does :

    struct in_device *in_dev = __in_dev_get_rcu(dev);
    if (!in_dev)
    return -EAFNOSUPPORT;

    This problem has been mentioned in commit cf7afbfeb8ce ("rtnl: make
    link af-specific updates atomic") changelog :

    This method is not fail proof, while it is currently sufficient
    to make set_link_af() inerrable and thus 100% atomic, the
    validation function method will not be able to detect all error
    scenarios in the future, there will likely always be errors
    depending on states which are f.e. not protected by rtnl_mutex
    and thus may change between validation and setting.

    IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready
    general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7]
    CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
    Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
    RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
    RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
    RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
    R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
    FS: 0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754
    rtnl_group_changelink net/core/rtnetlink.c:3103 [inline]
    __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257
    rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377
    rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438
    netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:652 [inline]
    sock_sendmsg+0xd7/0x130 net/socket.c:672
    ____sys_sendmsg+0x753/0x880 net/socket.c:2343
    ___sys_sendmsg+0x100/0x170 net/socket.c:2397
    __sys_sendmsg+0x105/0x1d0 net/socket.c:2430
    __do_sys_sendmsg net/socket.c:2439 [inline]
    __se_sys_sendmsg net/socket.c:2437 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
    do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4402e9
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9
    RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
    RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8
    R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70
    R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000
    Modules linked in:
    ---[ end trace cfa7664b8fdcdff3 ]---
    RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
    Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
    RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
    RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
    RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
    R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
    FS: 0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

    Fixes: 7dc2bccab0ee ("Validate required parameters in inet6_validate_link_af")
    Signed-off-by: Eric Dumazet
    Bisected-and-reported-by: syzbot
    Cc: Maxim Mikityanskiy
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

05 Jan, 2020

1 commit

  • [ Upstream commit 2beb6d2901a3f73106485d560c49981144aeacb1 ]

    In commit 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for
    doit handlers") we add strict check for inet6_rtm_getaddr(). But we did
    the invalid header values check before checking if NETLINK_F_STRICT_CHK
    is set. This may break backwards compatibility if user already set the
    ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code.

    I didn't move the nlmsg_len check because I thought it's a valid check.

    Reported-by: Jianlin Shi
    Fixes: 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers")
    Signed-off-by: Hangbin Liu
    Reviewed-by: David Ahern
    Signed-off-by: Jakub Kicinski
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     

05 Oct, 2019

2 commits

  • Rajendra reported a kernel panic when a link was taken down:

    [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    [ 6870.271856] IP: [] __ipv6_ifa_notify+0x154/0x290

    [ 6870.570501] Call Trace:
    [ 6870.573238] [] ? ipv6_ifa_notify+0x26/0x40
    [ 6870.579665] [] ? addrconf_dad_completed+0x4c/0x2c0
    [ 6870.586869] [] ? ipv6_dev_mc_inc+0x196/0x260
    [ 6870.593491] [] ? addrconf_dad_work+0x10a/0x430
    [ 6870.600305] [] ? __switch_to_asm+0x34/0x70
    [ 6870.606732] [] ? process_one_work+0x18a/0x430
    [ 6870.613449] [] ? worker_thread+0x4d/0x490
    [ 6870.619778] [] ? process_one_work+0x430/0x430
    [ 6870.626495] [] ? kthread+0xd9/0xf0
    [ 6870.632145] [] ? __switch_to_asm+0x34/0x70
    [ 6870.638573] [] ? kthread_park+0x60/0x60
    [ 6870.644707] [] ? ret_from_fork+0x57/0x70
    [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0

    addrconf_dad_work is kicked to be scheduled when a device is brought
    up. There is a race between addrcond_dad_work getting scheduled and
    taking the rtnl lock and a process taking the link down (under rtnl).
    The latter removes the host route from the inet6_addr as part of
    addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
    to use the host route in __ipv6_ifa_notify. If the down event removes
    the host route due to the race to the rtnl, then the BUG listed above
    occurs.

    Since the DAD sequence can not be aborted, add a check for the missing
    host route in __ipv6_ifa_notify. The only way this should happen is due
    to the previously mentioned race. The host route is created when the
    address is added to an interface; it is only removed on a down event
    where the address is kept. Add a warning if the host route is missing
    AND the device is up; this is a situation that should never happen.

    Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
    Reported-by: Rajendra Dendukuri
    Signed-off-by: David Ahern
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David Ahern
     
  • This reverts commit a3ce2a21bb8969ae27917281244fa91bf5f286d7.

    Eric reported tests failings with commit. After digging into it,
    the bottom line is that the DAD sequence is not to be messed with.
    There are too many cases that are expected to proceed regardless
    of whether a device is up.

    Revert the patch and I will send a different solution for the
    problem Rajendra reported.

    Signed-off-by: David Ahern
    Cc: Eric Dumazet
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David Ahern
     

02 Oct, 2019

1 commit

  • Rajendra reported a kernel panic when a link was taken down:

    [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    [ 6870.271856] IP: [] __ipv6_ifa_notify+0x154/0x290

    [ 6870.570501] Call Trace:
    [ 6870.573238] [] ? ipv6_ifa_notify+0x26/0x40
    [ 6870.579665] [] ? addrconf_dad_completed+0x4c/0x2c0
    [ 6870.586869] [] ? ipv6_dev_mc_inc+0x196/0x260
    [ 6870.593491] [] ? addrconf_dad_work+0x10a/0x430
    [ 6870.600305] [] ? __switch_to_asm+0x34/0x70
    [ 6870.606732] [] ? process_one_work+0x18a/0x430
    [ 6870.613449] [] ? worker_thread+0x4d/0x490
    [ 6870.619778] [] ? process_one_work+0x430/0x430
    [ 6870.626495] [] ? kthread+0xd9/0xf0
    [ 6870.632145] [] ? __switch_to_asm+0x34/0x70
    [ 6870.638573] [] ? kthread_park+0x60/0x60
    [ 6870.644707] [] ? ret_from_fork+0x57/0x70
    [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0

    addrconf_dad_work is kicked to be scheduled when a device is brought
    up. There is a race between addrcond_dad_work getting scheduled and
    taking the rtnl lock and a process taking the link down (under rtnl).
    The latter removes the host route from the inet6_addr as part of
    addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
    to use the host route in ipv6_ifa_notify. If the down event removes
    the host route due to the race to the rtnl, then the BUG listed above
    occurs.

    This scenario does not occur when the ipv6 address is not kept
    (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
    state of the ifp to DEAD. Handle when the addresses are kept by checking
    IF_READY which is reset by addrconf_ifdown.

    The 'dead' flag for an inet6_addr is set only under rtnl, in
    addrconf_ifdown and it means the device is getting removed (or IPv6 is
    disabled). The interesting cases for changing the idev flag are
    addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
    (reset the flag). The former does not have the idev lock - only rtnl;
    the latter has both. Based on that the existing dead + IF_READY check
    can be moved to right after the rtnl_lock in addrconf_dad_work.

    Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
    Reported-by: Rajendra Dendukuri
    Signed-off-by: David Ahern
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    David Ahern
     

24 Aug, 2019

1 commit

  • Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
    ignoring the specific error value. This results in addrconf_add_dev
    returning ENOBUFS in all cases, which is unfortunate in cases such as:

    # ip link add dummyX type dummy
    # ip link set dummyX mtu 1200 up
    # ip addr add 2000::/64 dev dummyX
    RTNETLINK answers: No buffer space available

    Commit a317a2f19da7 ("ipv6: fail early when creating netdev named all
    or default") introduced error returns in ipv6_add_dev. Before that,
    that function would simply return NULL for all failures.

    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller

    Sabrina Dubroca
     

22 Aug, 2019

1 commit

  • In commit 93a714d6b53d ("multicast: Extend ip address command to enable
    multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN
    to make user able to add multicast address on ethernet interface.

    This works for IPv4, but not for IPv6. See the inet6_addr_add code.

    static int inet6_addr_add()
    {
    ...
    if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
    ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...)
    }

    ifp = ipv6_add_addr(idev, cfg, true, extack); ifa_flags & IFA_F_MCAUTOJOIN) {
    ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...)
    }
    }

    But in ipv6_add_addr() it will check the address type and reject multicast
    address directly. So this feature is never worked for IPv6.

    We should not remove the multicast address check totally in ipv6_add_addr(),
    but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied.

    v2: update commit description

    Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on")
    Reported-by: Jianlin Shi
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

19 Jul, 2019

1 commit

  • In the sysctl code the proc_dointvec_minmax() function is often used to
    validate the user supplied value between an allowed range. This
    function uses the extra1 and extra2 members from struct ctl_table as
    minimum and maximum allowed value.

    On sysctl handler declaration, in every source file there are some
    readonly variables containing just an integer which address is assigned
    to the extra1 and extra2 members, so the sysctl range is enforced.

    The special values 0, 1 and INT_MAX are very often used as range
    boundary, leading duplication of variables like zero=0, one=1,
    int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

    Add a const int array containing the most commonly used values, some
    macros to refer more easily to the correct array member, and use them
    instead of creating a local one for every object file.

    This is the bloat-o-meter output comparing the old and new binary
    compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data old new delta
    sysctl_vals - 12 +12
    __kstrtab_sysctl_vals - 12 +12
    max 14 10 -4
    int_max 16 - -16
    one 68 - -68
    zero 128 28 -100
    Total: Before=20583249, After=20583085, chg -0.00%

    [mcroce@redhat.com: tipc: remove two unused variables]
    Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
    [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
    [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
    Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
    [akpm@linux-foundation.org: fix fs/eventpoll.c]
    Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
    Signed-off-by: Matteo Croce
    Signed-off-by: Arnd Bergmann
    Acked-by: Kees Cook
    Reviewed-by: Aaron Tomlin
    Cc: Matthew Wilcox
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matteo Croce
     

08 Jun, 2019

1 commit


05 Jun, 2019

1 commit

  • Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
    fib6_info side of the nexthop fib_info relationship. Since a fib6_info
    referencing a nexthop object can not have 'sibling' entries (the old way
    of doing multipath routes), the nh_list is a union with fib6_siblings.

    Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
    using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
    and delete fib entries using the nexthop.

    Add a few nexthop helpers for use when a nexthop is added to fib6_info:
    - nexthop_fib6_nh - return first fib6_nh in a nexthop object
    - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
    if the fib6_info references a nexthop object
    - nexthop_path_fib6_result - similar to ipv4, select a path within a
    multipath nexthop object. If the nexthop is a blackhole, set
    fib6_result type to RTN_BLACKHOLE, and set the REJECT flag

    Update the fib6_info references to check for nh and take a different path
    as needed:
    - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
    be coalesced with other fib entries into a multipath route
    - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
    a nexthop
    - addrconf (host routes), RA's and info entries (anything configured via
    ndisc) does not use nexthop objects
    - fib6_info_destroy_rcu - put reference to nexthop object
    - fib6_purge_rt - drop fib6_info from f6i_list
    - fib6_select_path - update to use the new nexthop_path_fib6_result when
    fib entry uses a nexthop object
    - rt6_device_match - update to catch use of nexthop object as a blackhole
    and set fib6_type and flags.
    - ip6_route_info_create - don't add space for fib6_nh if fib entry is
    going to reference a nexthop object, take a reference to nexthop object,
    disallow use of source routing
    - rt6_nlmsg_size - add space for RTA_NH_ID
    - add rt6_fill_node_nexthop to add nexthop data on a dump

    As with ipv4, most of the changes push existing code into the else branch
    of whether the fib entry uses a nexthop object.

    Update the nexthop code to walk f6i_list on a nexthop deleted to remove
    fib entries referencing it.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

03 Jun, 2019

1 commit


01 Jun, 2019

1 commit

  • The phylink conflict was between a bug fix by Russell King
    to make sure we have a consistent PHY interface mode, and
    a change in net-next to pull some code in phylink_resolve()
    into the helper functions phylink_mac_link_{up,down}()

    On the dp83867 side it's mostly overlapping changes, with
    the 'net' side removing a condition that was supposed to
    trigger for RGMII but because of how it was coded never
    actually could trigger.

    Signed-off-by: David S. Miller

    David S. Miller
     

31 May, 2019

2 commits

  • Pull yet more SPDX updates from Greg KH:
    "Here is another set of reviewed patches that adds SPDX tags to
    different kernel files, based on a set of rules that are being used to
    parse the comments to try to determine that the license of the file is
    "GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of
    these matches are included here, a number of "non-obvious" variants of
    text have been found but those have been postponed for later review
    and analysis.

    There is also a patch in here to add the proper SPDX header to a bunch
    of Kbuild files that we have missed in the past due to new files being
    added and forgetting that Kbuild uses two different file names for
    Makefiles. This issue was reported by the Kbuild maintainer.

    These patches have been out for review on the linux-spdx@vger mailing
    list, and while they were created by automatic tools, they were
    hand-verified by a bunch of different people, all whom names are on
    the patches are reviewers"

    * tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (82 commits)
    treewide: Add SPDX license identifier - Kbuild
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 222
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 221
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 220
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 218
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 217
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 216
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 215
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 214
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 213
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 211
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 210
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 207
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 203
    treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201
    ...

    Linus Torvalds
     
  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 3029 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

25 May, 2019

2 commits

  • Move fib6_nh to the end of fib6_info and make it an array of
    size 0. Pass a flag to fib6_info_alloc indicating if the
    allocation needs to add space for a fib6_nh.

    The current code path always has a fib6_nh allocated with a
    fib6_info; with nexthop objects they will be separate.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • rt6_info are specific instances of a fib entry and are tied to a
    device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib
    entries have separate fib6_info for each nexthop in a multipath route,
    so the location of the pcpu cache in the fib6_info struct worked.
    However, with nexthop objects a fib6_info can point to a set of nexthops
    (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu
    cache needs to be moved to the fib6_nh struct so the cached entries
    are local to the nexthop specification used to create the rt6_info.

    Initialization and free of the pcpu entries moved to fib6_nh_init and
    fib6_nh_release.

    Change in location only, from fib6_info down to fib6_nh; no other
    functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     

23 May, 2019

1 commit

  • inet6_set_link_af requires that at least one of IFLA_INET6_TOKEN or
    IFLA_INET6_ADDR_GET_MODE is passed. If none of them is passed, it
    returns -EINVAL, which may cause do_setlink() to fail in the middle of
    processing other commands and give the following warning message:

    A link change request failed with some changes committed already.
    Interface eth0 may have been left with an inconsistent configuration,
    please check.

    Check the presence of at least one of them in inet6_validate_link_af to
    detect invalid parameters at an early stage, before do_setlink does
    anything. Also validate the address generation mode at an early stage.

    Signed-off-by: Maxim Mikityanskiy
    Signed-off-by: David S. Miller

    Maxim Mikityanskiy
     

28 Apr, 2019

2 commits

  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

09 Apr, 2019

1 commit

  • Allow the gateway in a fib_nh_common to be from a different address
    family than the outer fib{6}_nh. To that end, replace nhc_has_gw with
    nhc_gw_family and update users of nhc_has_gw to check nhc_gw_family.
    Now nhc_family is used to know if the nh_common is part of a fib_nh
    or fib6_nh (used for container_of to get to route family specific data),
    and nhc_gw_family represents the address family for the gateway.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

30 Mar, 2019

2 commits

  • Rename fib6_nh entries that will be moved to a fib_nh_common struct.
    Specifically, the device, gateway, flags, and lwtstate are common
    with all nexthop definitions. In some places new temporary variables
    are declared or local variables renamed to maintain line lengths.

    Rename only; no functional change intended.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • The gateway setting is not per fib6_info entry but per-fib6_nh. Add a new
    fib_nh_has_gw flag to fib6_nh and convert references to RTF_GATEWAY to
    the new flag. For IPv6 address the flag is cheaper than checking that
    nh_gw is non-0 like IPv4 does.

    While this increases fib6_nh by 8-bytes, the effective allocation size of
    a fib6_info is unchanged. The 8 bytes is recovered later with a
    fib_nh_common change.

    Signed-off-by: David Ahern
    Reviewed-by: Ido Schimmel
    Signed-off-by: David S. Miller

    David Ahern
     

05 Mar, 2019

1 commit

  • When CONFIG_SYSCTL is turned off, we get a link failure for
    the newly introduced tuning knob.

    net/ipv6/addrconf.o: In function `addrconf_init_net':
    addrconf.c:(.text+0x31dc): undefined reference to `sysctl_devconf_inherit_init_net'

    Add an IS_ENABLED() check to fall back to the default behavior
    (sysctl_devconf_inherit_init_net=0) here.

    Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
    Signed-off-by: Arnd Bergmann
    Acked-by: Christian Brauner
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

16 Feb, 2019

1 commit

  • The netfilter conflicts were rather simple overlapping
    changes.

    However, the cls_tcindex.c stuff was a bit more complex.

    On the 'net' side, Cong is fixing several races and memory
    leaks. Whilst on the 'net-next' side we have Vlad adding
    the rtnl-ness support.

    What I've decided to do, in order to resolve this, is revert the
    conversion over to using a workqueue that Cong did, bringing us back
    to pure RCU. I did it this way because I believe that either Cong's
    races don't apply with have Vlad did things, or Cong will have to
    implement the race fix slightly differently.

    Signed-off-by: David S. Miller

    David S. Miller
     

12 Feb, 2019

1 commit

  • Follow those steps:
    # ip addr add 2001:123::1/32 dev eth0
    # ip addr add 2001:123:456::2/64 dev eth0
    # ip addr del 2001:123::1/32 dev eth0
    # ip addr del 2001:123:456::2/64 dev eth0
    and then prefix route of 2001:123::1/32 will still exist.

    This is because ipv6_prefix_equal in check_cleanup_prefix_route
    func does not check whether two IPv6 addresses have the same
    prefix length. If the prefix of one address starts with another
    shorter address prefix, even though their prefix lengths are
    different, the return value of ipv6_prefix_equal is true.

    Here I add a check of whether two addresses have the same prefix
    to decide whether their prefixes are equal.

    Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
    Signed-off-by: Zhiqiang Liu
    Reported-by: Wenhao Zhang
    Signed-off-by: David S. Miller

    Zhiqiang Liu
     

28 Jan, 2019

1 commit


23 Jan, 2019

3 commits

  • This message gets logged far too often for how interesting is it.

    Most distributions nowadays configure NetworkManager to use randomly
    generated MAC addresses for Wi-Fi network scans. The interfaces end up
    being periodically brought down for the address change. When they're
    subsequently brought back up, the message is logged, eventually flooding
    the log.

    Perhaps the message is not all that helpful: it seems to be more
    interesting to hear when the addrconf actually start, not when it does
    not. Let's lower its level.

    Signed-off-by: Lubomir Rintel
    Acked-By: Thomas Haller
    Signed-off-by: David S. Miller

    Lubomir Rintel
     
  • in6_dump_addrs() returns a positive 1 if there was nothing to dump.
    This return value can not be passed as return from inet6_dump_addr()
    as is, because it will confuse rtnetlink, resulting in NLMSG_DONE
    never getting set:

    $ ip addr list dev lo
    EOF on netlink
    Dump terminated

    v2: flip condition to avoid a new goto (DaveA)

    Fixes: 7c1e8a3817c5 ("netlink: fixup regression in RTM_GETADDR")
    Reported-by: Brendan Galloway
    Signed-off-by: Jakub Kicinski
    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Jakub Kicinski
     
  • There have been many people complaining about the inconsistent
    behaviors of IPv4 and IPv6 devconf when creating new network
    namespaces. Currently, for IPv4, we inherit all current settings
    from init_net, but for IPv6 we reset all setting to default.

    This patch introduces a new /proc file
    /proc/sys/net/core/devconf_inherit_init_net to control the
    behavior of whether to inhert sysctl current settings from init_net.
    This file itself is only available in init_net.

    As demonstrated below:

    Initial setup in init_net:
    # cat /proc/sys/net/ipv4/conf/all/rp_filter
    2
    # cat /proc/sys/net/ipv6/conf/all/accept_dad
    1

    Default value 0 (current behavior):
    # ip netns del test
    # ip netns add test
    # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
    2
    # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
    0

    Set to 1 (inherit from init_net):
    # echo 1 > /proc/sys/net/core/devconf_inherit_init_net
    # ip netns del test
    # ip netns add test
    # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
    2
    # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
    1

    Set to 2 (reset to default):
    # echo 2 > /proc/sys/net/core/devconf_inherit_init_net
    # ip netns del test
    # ip netns add test
    # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter
    0
    # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad
    0

    Set to a value out of range (invalid):
    # echo 3 > /proc/sys/net/core/devconf_inherit_init_net
    -bash: echo: write error: Invalid argument
    # echo -1 > /proc/sys/net/core/devconf_inherit_init_net
    -bash: echo: write error: Invalid argument

    Reported-by: Zhu Yanjun
    Reported-by: Tonghao Zhang
    Cc: Nicolas Dichtel
    Signed-off-by: Cong Wang
    Acked-by: Nicolas Dichtel
    Acked-by: Tonghao Zhang
    Signed-off-by: David S. Miller

    Cong Wang
     

20 Jan, 2019

2 commits


05 Jan, 2019

1 commit

  • This commit fixes a regression in AF_INET/RTM_GETADDR and
    AF_INET6/RTM_GETADDR.

    Before this commit, the kernel would stop dumping addresses once the first
    skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error
    shouldn't be sent back to netlink_dump so the callback is kept alive. The
    userspace is expected to call back with a new empty skb.

    Changes from V1:
    - The error is not handled in netlink_dump anymore but rather in
    inet_dump_ifaddr and inet6_dump_addr directly as suggested by
    David Ahern.

    Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes")
    Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes")

    Cc: David Ahern
    Cc: "David S . Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: Arthur Gautier
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Arthur Gautier
     

31 Dec, 2018

1 commit


07 Dec, 2018

1 commit

  • In order to pass extack together with NETDEV_PRE_UP notifications, it's
    necessary to route the extack to __dev_open() from diverse (possibly
    indirect) callers. One prominent API through which the notification is
    invoked is dev_open().

    Therefore extend dev_open() with and extra extack argument and update
    all users. Most of the calls end up just encoding NULL, but bond and
    team drivers have the extack readily available.

    Signed-off-by: Petr Machata
    Acked-by: Jiri Pirko
    Reviewed-by: Ido Schimmel
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Petr Machata
     

24 Nov, 2018

1 commit

  • When we add a new IPv6 address, we should also join corresponding solicited-node
    multicast address, unless the interface has IFF_NOARP flag, as function
    addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
    not do dad and add the mcast address. So we will drop corresponding neighbour
    discovery message that came from other nodes.

    A typical example is after creating a ipvlan with mode l3, setting up an ipv6
    address and changing the mode to l2. Then we will not be able to ping this
    address as the interface doesn't join related solicited-node mcast address.

    Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
    corresponding mcast group and check if there is a duplicate address on the
    network.

    Reported-by: Jianlin Shi
    Reviewed-by: Stefano Brivio
    Signed-off-by: Hangbin Liu
    Signed-off-by: David S. Miller

    Hangbin Liu
     

26 Oct, 2018

1 commit

  • The cleanup path will put the target net when netnsid is set. So we must
    reset netnsid if the input is invalid.

    Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes")
    Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes")
    Cc: David Ahern
    Signed-off-by: Bjørn Mork
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Bjørn Mork
     

25 Oct, 2018

1 commit


23 Oct, 2018

2 commits


22 Oct, 2018

1 commit