08 May, 2019

2 commits


06 May, 2019

2 commits


03 May, 2019

1 commit


02 May, 2019

1 commit

  • It is a followup after the fix in
    commit 9c69a1320515 ("route: Avoid crash from dereferencing NULL rt->from")

    rt6_do_redirect():
    1. NULL checking is needed on rt->from because a parallel
    fib6_info delete could happen that sets rt->from to NULL.
    (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).

    2. fib6_info_hold() is not enough. Same reason as (1).
    Meaning, holding dst->__refcnt cannot ensure
    rt->from is not NULL or rt->from->fib6_ref is not 0.

    Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
    is already doing, this patch chooses to extend the rcu section
    to keep "from" dereference-able after checking for NULL.

    inet6_rtm_getroute():
    1. NULL checking is also needed on rt->from for a similar reason.
    Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.

    Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
    Signed-off-by: Martin KaFai Lau
    Acked-by: Wei Wang
    Reviewed-by: David Ahern
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     

01 May, 2019

1 commit

  • We had many syzbot reports that seem to be caused by use-after-free
    of struct fib6_info.

    ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
    are writers vs rt->from, and use non consistent synchronization among
    themselves.

    Switching to xchg() will solve the issues with no possible
    lockdep issues.

    BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
    BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
    BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
    BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
    BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
    Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649

    CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
    check_memory_region_inline mm/kasan/generic.c:185 [inline]
    check_memory_region+0x123/0x190 mm/kasan/generic.c:191
    kasan_check_write+0x14/0x20 mm/kasan/common.c:108
    atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
    fib6_info_release include/net/ip6_fib.h:294 [inline]
    fib6_info_release include/net/ip6_fib.h:292 [inline]
    fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
    fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
    fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
    fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
    fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
    fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
    fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
    fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
    __fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
    fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
    rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
    rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
    addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
    addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
    notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
    call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
    call_netdevice_notifiers net/core/dev.c:1779 [inline]
    dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
    rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
    rollback_registered+0x109/0x1d0 net/core/dev.c:8242
    unregister_netdevice_queue net/core/dev.c:9289 [inline]
    unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
    unregister_netdevice include/linux/netdevice.h:2658 [inline]
    __tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
    tun_detach drivers/net/tun.c:744 [inline]
    tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
    __fput+0x2e5/0x8d0 fs/file_table.c:278
    ____fput+0x16/0x20 fs/file_table.c:309
    task_work_run+0x14a/0x1c0 kernel/task_work.c:113
    exit_task_work include/linux/task_work.h:22 [inline]
    do_exit+0x90a/0x2fa0 kernel/exit.c:876
    do_group_exit+0x135/0x370 kernel/exit.c:980
    __do_sys_exit_group kernel/exit.c:991 [inline]
    __se_sys_exit_group kernel/exit.c:989 [inline]
    __x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x458da9
    Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
    RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
    RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
    RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
    R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
    R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800

    Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: David Ahern
    Reviewed-by: David Ahern
    Acked-by: Martin KaFai Lau
    Acked-by: Wei Wang
    Signed-off-by: David S. Miller

    Eric Dumazet
     

30 Apr, 2019

5 commits

  • Steffen Klassert says:

    ====================
    pull request (net-next): ipsec-next 2019-04-30

    1) A lot of work to remove indirections from the xfrm code.
    From Florian Westphal.

    2) Support ESP offload in combination with gso partial.
    From Boris Pismenny.

    3) Remove some duplicated code from vti4.
    From Jeremy Sowden.

    Please note that there is merge conflict

    between commit:

    8742dc86d0c7 ("xfrm4: Fix uninitialized memory read in _decode_session4")

    from the ipsec tree and commit:

    c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy")

    from the ipsec-next tree. The merge conflict will appear
    when those trees get merged during the merge window.
    The conflict can be solved as it is done in linux-next:

    https://lkml.org/lkml/2019/4/25/1207

    Please pull or let me know if there are problems.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Steffen Klassert says:

    ====================
    pull request (net): ipsec 2019-04-30

    1) Fix an out-of-bound array accesses in __xfrm_policy_unlink.
    From YueHaibing.

    2) Reset the secpath on failure in the ESP GRO handlers
    to avoid dereferencing an invalid pointer on error.
    From Myungho Jung.

    3) Add and revert a patch that tried to add rcu annotations
    to netns_xfrm. From Su Yanjun.

    4) Wait for rcu callbacks before freeing xfrm6_tunnel_spi_kmem.
    From Su Yanjun.

    5) Fix forgotten vti4 ipip tunnel deregistration.
    From Jeremy Sowden:

    6) Remove some duplicated log messages in vti4.
    From Jeremy Sowden.

    7) Don't use IPSEC_PROTO_ANY when flushing states because
    this will flush only IPsec portocol speciffic states.
    IPPROTO_ROUTING states may remain in the lists when
    doing net exit. Fix this by replacing IPSEC_PROTO_ANY
    with zero. From Cong Wang.

    8) Add length check for UDP encapsulation to fix "Oversized IP packet"
    warnings on receive side. From Sabrina Dubroca.

    9) Fix xfrm interface lookup when the interface is associated to
    a vrf layer 3 master device. From Martin Willi.

    10) Reload header pointers after pskb_may_pull() in _decode_session4(),
    otherwise we may read from uninitialized memory.

    11) Update the documentation about xfrm[46]_gc_thresh, it
    is not used anymore after the flowcache removal.
    From Nicolas Dichtel.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • syzbot was able to catch a use-after-free read in pid_nr_ns() [1]

    ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid
    but fl_free() releases fl->owner.pid before rcu grace period is started.

    [1]

    BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407
    Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087

    CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ #89
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
    kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
    __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
    pid_nr_ns+0x128/0x140 kernel/pid.c:407
    ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794
    seq_read+0xad3/0x1130 fs/seq_file.c:268
    proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227
    do_loop_readv_writev fs/read_write.c:701 [inline]
    do_loop_readv_writev fs/read_write.c:688 [inline]
    do_iter_read+0x4a9/0x660 fs/read_write.c:922
    vfs_readv+0xf0/0x160 fs/read_write.c:984
    kernel_readv fs/splice.c:358 [inline]
    default_file_splice_read+0x475/0x890 fs/splice.c:413
    do_splice_to+0x12a/0x190 fs/splice.c:876
    splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953
    do_splice_direct+0x1da/0x2a0 fs/splice.c:1062
    do_sendfile+0x597/0xd00 fs/read_write.c:1443
    __do_sys_sendfile64 fs/read_write.c:1498 [inline]
    __se_sys_sendfile64 fs/read_write.c:1490 [inline]
    __x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x458da9
    Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
    RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9
    RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007
    RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4
    R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff

    Allocated by task 17543:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_kmalloc mm/kasan/common.c:497 [inline]
    __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
    kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
    slab_post_alloc_hook mm/slab.h:437 [inline]
    slab_alloc mm/slab.c:3393 [inline]
    kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
    alloc_pid+0x55/0x8f0 kernel/pid.c:168
    copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932
    copy_process kernel/fork.c:1709 [inline]
    _do_fork+0x257/0xfd0 kernel/fork.c:2226
    __do_sys_clone kernel/fork.c:2333 [inline]
    __se_sys_clone kernel/fork.c:2327 [inline]
    __x64_sys_clone+0xbf/0x150 kernel/fork.c:2327
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 7789:
    save_stack+0x45/0xd0 mm/kasan/common.c:75
    set_track mm/kasan/common.c:87 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
    kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
    __cache_free mm/slab.c:3499 [inline]
    kmem_cache_free+0x86/0x260 mm/slab.c:3765
    put_pid.part.0+0x111/0x150 kernel/pid.c:111
    put_pid+0x20/0x30 kernel/pid.c:105
    fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102
    ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152
    call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
    expire_timers kernel/time/timer.c:1362 [inline]
    __run_timers kernel/time/timer.c:1681 [inline]
    __run_timers kernel/time/timer.c:1649 [inline]
    run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
    __do_softirq+0x266/0x95a kernel/softirq.c:293

    The buggy address belongs to the object at ffff888094012a00
    which belongs to the cache pid_2 of size 88
    The buggy address is located 4 bytes inside of
    88-byte region [ffff888094012a00, ffff888094012a58)
    The buggy address belongs to the page:
    page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980
    flags: 0x1fffc0000000200(slab)
    raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080
    raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
    ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
    >ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
    ^
    ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
    ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc

    Fixes: 4f82f45730c6 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t")
    Signed-off-by: Eric Dumazet
    Cc: Eric W. Biederman
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When there is no route to an IPv6 dest addr, skb_dst(skb) points
    to loopback dev in the case of that the IP6CB(skb)->iif is
    enslaved to a vrf. This causes Ip6InNoRoutes to be incremented on the
    loopback dev. This also causes the lookup to fail on icmpv6_send() and
    the dest unreachable to not sent and Ip6OutNoRoutes gets incremented on
    the loopback dev.

    To reproduce:
    * Gateway configuration:
    ip link add dev vrf_258 type vrf table 258
    ip link set dev enp0s9 master vrf_258
    ip addr add 66:1/64 dev enp0s9
    ip -6 route add unreachable default metric 8192 table 258
    sysctl -w net.ipv6.conf.all.forwarding=1
    sysctl -w net.ipv6.conf.enp0s9.forwarding=1
    * Sender configuration:
    ip addr add 66::2/64 dev enp0s9
    ip -6 route add default via 66::1
    and ping 67::1 for example from the sender.

    Fix this by counting on the original netdev and reset the skb dst to
    force a fresh lookup.

    v2: Fix typo of destination address in the repro steps.
    v3: Simplify the loopback check (per David Ahern) and use reverse
    Christmas tree format (per David Miller).

    Signed-off-by: Stephen Suryaputra
    Reviewed-by: David Ahern
    Tested-by: David Ahern
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Stephen Suryaputra
     
  • A request for a flowlabel fails in process or user exclusive mode must
    fail if the caller pid or uid does not match. Invert the test.

    Previously, the test was unsafe wrt PID recycling, but indeed tested
    for inequality: fl1->owner != fl->owner

    Fixes: 4f82f45730c68 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t")
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

28 Apr, 2019

3 commits

  • Add options to strictly validate messages and dump messages,
    sometimes perhaps validating dump messages non-strictly may
    be required, so add an option for that as well.

    Since none of this can really be applied to existing commands,
    set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
    {
    .cmd = X,
    + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
    ...
    },
    ...
    };

    For new commands one should just not copy the .validate 'opt-out'
    flags and thus get strict validation.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • We currently have two levels of strict validation:

    1) liberal (default)
    - undefined (type >= max) & NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted
    - garbage at end of message accepted
    2) strict (opt-in)
    - NLA_UNSPEC attributes accepted
    - attribute length >= expected accepted

    Split out parsing strictness into four different options:
    * TRAILING - check that there's no trailing data after parsing
    attributes (in message or nested)
    * MAXTYPE - reject attrs > max known type
    * UNSPEC - reject attributes with NLA_UNSPEC policy entries
    * STRICT_ATTRS - strictly validate attribute size

    The default for future things should be *everything*.
    The current *_strict() is a combination of TRAILING and MAXTYPE,
    and is renamed to _deprecated_strict().
    The current regular parsing has none of this, and is renamed to
    *_parse_deprecated().

    Additionally it allows us to selectively set one of the new flags
    even on old policies. Notably, the UNSPEC flag could be useful in
    this case, since it can be arranged (by filling in the policy) to
    not be an incompatible userspace ABI change, but would then going
    forward prevent forgetting attribute entries. Similar can apply
    to the POLICY flag.

    We end up with the following renames:
    * nla_parse -> nla_parse_deprecated
    * nla_parse_strict -> nla_parse_deprecated_strict
    * nlmsg_parse -> nlmsg_parse_deprecated
    * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
    * nla_parse_nested -> nla_parse_nested_deprecated
    * nla_validate_nested -> nla_validate_nested_deprecated

    Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

    For this patch, don't actually add the strict, non-renamed versions
    yet so that it breaks compile if I get it wrong.

    Also, while at it, make nla_validate and nla_parse go down to a
    common __nla_validate_parse() function to avoid code duplication.

    Ultimately, this allows us to have very strict validation for every
    new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
    next patch, while existing things will continue to work as is.

    In effect then, this adds fully strict validation for any new command.

    Signed-off-by: Johannes Berg
    Signed-off-by: David S. Miller

    Johannes Berg
     
  • Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
    netlink based interfaces (including recently added ones) are still not
    setting it in kernel generated messages. Without the flag, message parsers
    not aware of attribute semantics (e.g. wireshark dissector or libmnl's
    mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
    the structure of their contents.

    Unfortunately we cannot just add the flag everywhere as there may be
    userspace applications which check nlattr::nla_type directly rather than
    through a helper masking out the flags. Therefore the patch renames
    nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
    as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
    are rewritten to use nla_nest_start().

    Except for changes in include/net/netlink.h, the patch was generated using
    this semantic patch:

    @@ expression E1, E2; @@
    -nla_nest_start(E1, E2)
    +nla_nest_start_noflag(E1, E2)

    @@ expression E1, E2; @@
    -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
    +nla_nest_start(E1, E2)

    Signed-off-by: Michal Kubecek
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Michal Kubecek
     

26 Apr, 2019

1 commit


24 Apr, 2019

6 commits

  • arg.result is sometimes used as fib6_result and sometimes used to
    hold the rt6_info. Add rt6_info to fib6_result and make the use
    of arg.result consistent through ipv6 rules.

    The rt6 entry is filled in for lookups returning a dst_entry, but not
    for direct fib_lookups that just want a fib6_info.

    Fixes: effda4dd97e8 ("ipv6: Pass fib6_result to fib lookups")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • fib rule actions should return -EAGAIN for the rules to continue to the
    next one. A recent change overwrote err with the lookup always returning
    0 (future change will make it more like IPv4) which means the rules
    stopped at the first (e.g., local table lookup only). Catch and reset err
    to -EAGAIN.

    Fixes: effda4dd97e87 ("ipv6: Pass fib6_result to fib lookups")
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • nhc_flags holds the RTNH_F flags for a given nexthop (fib{6}_nh).
    All of the RTNH_F_ flags fit in an unsigned char, and since the API to
    userspace (rtnh_flags and lower byte of rtm_flags) is 1 byte it can not
    grow. Make nhc_flags in fib_nh_common an unsigned char and shrink the
    size of the struct by 8, from 56 to 48 bytes.

    Update the flags arguments for up netdevice events and fib_nexthop_info
    which determines the RTNH_F flags to return on a dump/event. The RTNH_F
    flags are passed in the lower byte of rtm_flags which is an unsigned int
    so use a temp variable for the flags to fib_nexthop_info and combine
    with rtm_flags in the caller.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • We suspect some issues involving fib6_ref 0 -> 1 transitions might
    cause strange syzbot reports.

    Lets convert fib6_ref to refcount_t to catch them earlier.

    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Acked-by: Wei Wang
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Instead of using atomic_inc(), prefer fib6_info_hold()
    so that upcoming refcount_t conversion is simpler.

    Only fib6_info_alloc() is using atomic_set() since we
    just allocated a new object.

    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Acked-by: Wei Wang
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • We do not need to clear f6i->rt6i_exception_bucket right before
    freeing f6i.

    Note that f6i->rt6i_exception_bucket is properly protected by
    f6i->exception_bucket_flushed being set to one in rt6_flush_exceptions()
    under the protection of rt6_exception_lock.

    Signed-off-by: Eric Dumazet
    Cc: Wei Wang
    Acked-by: Wei Wang
    Reviewed-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     

23 Apr, 2019

5 commits


20 Apr, 2019

1 commit

  • The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
    socket protocol handlers, and all of those end up calling the same
    sock_get_timestamp()/sock_get_timestampns() helper functions, which
    results in a lot of duplicate code.

    With the introduction of 64-bit time_t on 32-bit architectures, this
    gets worse, as we then need four different ioctl commands in each
    socket protocol implementation.

    To simplify that, let's add a new .gettstamp() operation in
    struct proto_ops, and move ioctl implementation into the common
    sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
    through.

    We can reuse the sock_get_timestamp() implementation, but generalize
    it so it can deal with both native and compat mode, as well as
    timeval and timespec structures.

    Acked-by: Stefan Schmidt
    Acked-by: Neil Horman
    Acked-by: Marc Kleine-Budde
    Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/
    Signed-off-by: Arnd Bergmann
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

19 Apr, 2019

2 commits

  • To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP
    message types use larger numeric values, a simple bitmask doesn't fit.
    I use large bitmap. The input and output are the in form of list of
    ranges. Set the default to rate limit all error messages but Packet Too
    Big. For Packet Too Big, use ratemask instead of hard-coded.

    There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow()
    aren't called. This patch only adds them to icmpv6_echo_reply().

    Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says
    that it is also acceptable to rate limit informational messages. Thus,
    I removed the current hard-coded behavior of icmpv6_mask_allow() that
    doesn't rate limit informational messages.

    v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL
    isn't defined, expand the description in ip-sysctl.txt and remove
    unnecessary conditional before kfree().
    v3: Inline the bitmap instead of dynamically allocated. Still is a
    pointer to it is needed because of the way proc_do_large_bitmap work.

    Signed-off-by: Stephen Suryaputra
    Signed-off-by: David S. Miller

    Stephen Suryaputra
     
  • There is a spelling mistake in a NL_SET_ERR_MSG_MOD error message,
    fix it.

    Signed-off-by: Colin Ian King
    Reviewed-by: Mukesh Ojha
    Signed-off-by: David S. Miller

    Colin Ian King
     

18 Apr, 2019

10 commits

  • Disabling IPv6 on an interface removes existing entries but nothing prevents
    new entries from being manually added. To that end, add a new neigh_table
    operation, allow_add, that is called on RTM_NEWNEIGH to see if neighbor
    entries are allowed on a given device. If IPv6 is disabled on the device,
    allow_add returns false and passes a message back to the user via extack.

    $ echo 1 > /proc/sys/net/ipv6/conf/eth1/disable_ipv6
    $ ip -6 neigh add fe80::4c88:bff:fe21:2704 dev eth1 lladdr de:ad:be:ef:01:01
    Error: IPv6 is disabled on this device.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Add the fib6_flags and fib6_type to fib6_result. Update the lookup helpers
    to set them and update post fib lookup users to use the version from the
    result.

    This allows nexthop objects to have blackhole nexthop.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Change fib6_lookup and fib6_table_lookup to take a fib6_result and set
    f6i and nh rather than returning a fib6_info. For now both always
    return 0.

    A later patch set can make these more like the IPv4 counterparts and
    return EINVAL, EACCESS, etc based on fib6_type.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Change fib6_table_lookup tracepoint to take the fib6_result and use
    the fib6_info and fib6_nh from it.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Pass fib6_result to rt6_select. Instead of returning the fib entry, it
    will set f6i and nh based on the lookup.

    find_rr_leaf is changed to remove the match option in favor of taking
    fib6_result and having __find_rr_leaf set f6i in the result.

    In the process, update fib6_info references in __find_rr_leaf to f6i names.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Pass fib6_result to rt6_device_match with f6i set. rt6_device_match
    updates f6i in the result if it finds a better match and sets nh.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Change ip6_mtu_from_fib6 and fib6_mtu to take a fib6_result over a
    fib6_info. Update both to use the fib6_nh from fib6_result.

    Since the signature of ip6_mtu_from_fib6 is already changing, add const
    to daddr and saddr.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Update rt6_insert_exception to take a fib6_result over a fib6_info.
    Change ort to f6i from the fib6_result and rename to better reflect
    what it references (a fib6_info).

    Since this function is already getting changed, update the comments
    to reference fib6_info variables rather than the older rt6_info.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Now that all callers are update to have a fib6_result, pass it down
    to ip6_rt_get_dev_rcu, ip6_rt_copy_init, and ip6_rt_init_dst.

    In the process, change ort to f6i in ip6_rt_copy_init to make it
    clear it is a reference to a fib6_info.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • Update ip6_rt_pcpu_alloc, rt6_get_pcpu_route and rt6_make_pcpu_route
    to a fib6_result over a fib6_info.

    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern