18 Mar, 2018

5 commits

  • commit ae0ac0ed6fcf5af3be0f63eb935f483f44a402d2 upstream.

    instead of allocating each xt_counter individually, allocate 4k chunks
    and then use these for counter allocation requests.

    This should speed up rule evaluation by increasing data locality,
    also speeds up ruleset loading because we reduce calls to the percpu
    allocator.

    As Eric points out we can't use PAGE_SIZE, page_allocator would fail on
    arches with 64k page size.

    Suggested-by: Eric Dumazet
    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit f28e15bacedd444608e25421c72eb2cf4527c9ca upstream.

    Keeps some noise away from a followup patch.

    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit 4d31eef5176df06f218201bc9c0ce40babb41660 upstream.

    On SMP we overload the packet counter (unsigned long) to contain
    percpu offset. Hide this from callers and pass xt_counters address
    instead.

    Preparation patch to allocate the percpu counters in page-sized batch
    chunks.

    Signed-off-by: Florian Westphal
    Acked-by: Eric Dumazet
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit b078556aecd791b0e5cb3a59f4c3a14273b52121 upstream.

    l4proto->manip_pkt() can cause reallocation of skb head so pointer
    to the ipv6 header must be reloaded.

    Reported-and-tested-by:
    Fixes: 58a317f1061c89 ("netfilter: ipv6: add IPv6 NAT support")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     
  • commit 57ebd808a97d7c5b1e1afb937c2db22beba3c1f8 upstream.

    The rationale for removing the check is only correct for rulesets
    generated by ip(6)tables.

    In iptables, a jump can only occur to a user-defined chain, i.e.
    because we size the stack based on number of user-defined chains we
    cannot exceed stack size.

    However, the underlying binary format has no such restriction,
    and the validation step only ensures that the jump target is a
    valid rule start point.

    IOW, its possible to build a rule blob that has no user-defined
    chains but does contain a jump.

    If this happens, no jump stack gets allocated and crash occurs
    because no jumpstack was allocated.

    Fixes: 7814b6ec6d0d6 ("netfilter: xtables: don't save/restore jumpstack offset")
    Reported-by: syzbot+e783f671527912cd9403@syzkaller.appspotmail.com
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Florian Westphal
     

11 Mar, 2018

2 commits

  • [ Upstream commit 15f35d49c93f4fa9875235e7bf3e3783d2dd7a1b ]

    Since UDP-Lite is always using checksum, the following path is
    triggered when calculating pseudo header for it:

    udp4_csum_init() or udp6_csum_init()
    skb_checksum_init_zero_check()
    __skb_checksum_validate_complete()

    The problem can appear if skb->len is less than CHECKSUM_BREAK. In
    this particular case __skb_checksum_validate_complete() also invokes
    __skb_checksum_complete(skb). If UDP-Lite is using partial checksum
    that covers only part of a packet, the function will return bad
    checksum and the packet will be dropped.

    It can be fixed if we skip skb_checksum_init_zero_check() and only
    set the required pseudo header checksum for UDP-Lite with partial
    checksum before udp4_csum_init()/udp6_csum_init() functions return.

    Fixes: ed70fcfcee95 ("net: Call skb_checksum_init in IPv4")
    Fixes: e4f45b7f40bd ("net: Call skb_checksum_init in IPv6")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     
  • [ Upstream commit ca79bec237f5809a7c3c59bd41cd0880aa889966 ]

    gcc-8 has a new warning that detects overlapping input and output arguments
    in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(),
    which is actually correct:

    net/ipv6/sit.c: In function 'sit_init_net':
    net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]

    The problem here is that the logic detecting the memcpy() arguments finds them
    to be the same, but the conditional that tests for the input and output of
    ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant.

    We know that netdev_priv(t->dev) is the same as t for a tunnel device,
    and comparing "dev" directly here lets the compiler figure out as well
    that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so
    it no longer warns.

    This code is old, so Cc stable to make sure that we don't get the warning
    for older kernels built with new gcc.

    Cc: Martin Sebor
    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456
    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     

03 Mar, 2018

2 commits

  • [ Upstream commit c9fefa08190fc879fb2e681035d7774e0a8c5170 ]

    Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but
    IPV6_MIN_MTU actually only works when the inner packet is ipv6.

    With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst
    couldn't be set less than 1280. It would cause tx_err and the
    packet to be dropped when the outer dst pmtu is close to 1280.

    Jianlin found it by running ipv4 traffic with the topo:

    (client) gre6 eth1 (route) eth2 gre6 (server)

    After changing eth2 mtu to 1300, the performance became very
    low, or the connection was even broken. The issue also affects
    ip4ip6 and ip6ip6 tunnels.

    So if the inner packet is ipv4, 576 should be considered as the
    min mtu.

    Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can
    only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP.
    This patch using 576 as the min mtu for non-ipv6 packet works
    for all those cases.

    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 588753f1eb18978512b1c9b85fddb457d46f9033 ]

    One example of when an ICMPv6 packet is required to be looped back is
    when a host acts as both a Multicast Listener and a Multicast Router.

    A Multicast Router will listen on address ff02::16 for MLDv2 messages.

    Currently, MLDv2 messages originating from a Multicast Listener running
    on the same host as the Multicast Router are not being delivered to the
    Multicast Router. This is due to dst.input being assigned the default
    value of dst_discard.

    This results in the packet being looped back but discarded before being
    delivered to the Multicast Router.

    This patch sets dst.input to ip6_input to ensure a looped back packet
    is delivered to the Multicast Router.

    Signed-off-by: Brendan McGrath
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Brendan McGrath
     

28 Feb, 2018

1 commit

  • commit 01ea306f2ac2baff98d472da719193e738759d93 upstream.

    The Syzbot reported a possible deadlock in the netfilter area caused by
    rtnl lock, xt lock and socket lock being acquired with a different order
    on different code paths, leading to the following backtrace:
    Reviewed-by: Xin Long

    ======================================================
    WARNING: possible circular locking dependency detected
    4.15.0+ #301 Not tainted
    ------------------------------------------------------
    syzkaller233489/4179 is trying to acquire lock:
    (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    but task is already holding lock:
    (&xt[i].mutex){+.+.}, at: []
    xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

    which lock already depends on the new lock.
    ===

    Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
    only in the required scope"), we already acquire the socket lock in
    the innermost scope, where needed. In such commit I forgot to remove
    the outer-most socket lock from the getsockopt() path, this commit
    addresses the issues dropping it now.

    v1 -> v2: fix bad subj, added relavant 'fixes' tag

    Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
    Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
    Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope")
    Reported-by: syzbot+ddde1c7b7ff7442d7f2d@syzkaller.appspotmail.com
    Suggested-by: Florian Westphal
    Signed-off-by: Paolo Abeni
    Signed-off-by: Pablo Neira Ayuso
    Tested-by: Krzysztof Piotr Oledzki
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

25 Feb, 2018

1 commit

  • commit 3f34cfae1238848fd53f25e5c8fd59da57901f4b upstream.

    Syzbot reported several deadlocks in the netfilter area caused by
    rtnl lock and socket lock being acquired with a different order on
    different code paths, leading to backtraces like the following one:

    ======================================================
    WARNING: possible circular locking dependency detected
    4.15.0-rc9+ #212 Not tainted
    ------------------------------------------------------
    syzkaller041579/3682 is trying to acquire lock:
    (sk_lock-AF_INET6){+.+.}, at: [] lock_sock
    include/net/sock.h:1463 [inline]
    (sk_lock-AF_INET6){+.+.}, at: []
    do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167

    but task is already holding lock:
    (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (rtnl_mutex){+.+.}:
    __mutex_lock_common kernel/locking/mutex.c:756 [inline]
    __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
    mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
    rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
    register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607
    tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106
    xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845
    check_target net/ipv6/netfilter/ip6_tables.c:538 [inline]
    find_check_entry.isra.7+0x935/0xcf0
    net/ipv6/netfilter/ip6_tables.c:580
    translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749
    do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline]
    do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691
    nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
    nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
    ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928
    udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
    sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
    SYSC_setsockopt net/socket.c:1849 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1828
    entry_SYSCALL_64_fastpath+0x29/0xa0

    -> #0 (sk_lock-AF_INET6){+.+.}:
    lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914
    lock_sock_nested+0xc2/0x110 net/core/sock.c:2780
    lock_sock include/net/sock.h:1463 [inline]
    do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167
    ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
    udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422
    sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
    SYSC_setsockopt net/socket.c:1849 [inline]
    SyS_setsockopt+0x189/0x360 net/socket.c:1828
    entry_SYSCALL_64_fastpath+0x29/0xa0

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(rtnl_mutex);
    lock(sk_lock-AF_INET6);
    lock(rtnl_mutex);
    lock(sk_lock-AF_INET6);

    *** DEADLOCK ***

    1 lock held by syzkaller041579/3682:
    #0: (rtnl_mutex){+.+.}, at: [] rtnl_lock+0x17/0x20
    net/core/rtnetlink.c:74

    The problem, as Florian noted, is that nf_setsockopt() is always
    called with the socket held, even if the lock itself is required only
    for very tight scopes and only for some operation.

    This patch addresses the issues moving the lock_sock() call only
    where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt()
    does not need anymore to acquire both locks.

    Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
    Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com
    Suggested-by: Florian Westphal
    Signed-off-by: Paolo Abeni
    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: Greg Kroah-Hartman

    Paolo Abeni
     

13 Feb, 2018

2 commits

  • [ Upstream commit 7ece54a60ee2ba7a386308cae73c790bd580589c ]

    If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
    implicitly implies it is an ipv6only socket. However, in inet6_bind(),
    this addr_type checking and setting sk->sk_ipv6only to 1 are only done
    after sk->sk_prot->get_port(sk, snum) has been completed successfully.

    This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
    the 'get_port()'.

    In particular, when binding SO_REUSEPORT UDP sockets,
    udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock()
    checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
    sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be
    1 while ipv6_only_sock(sk) is still 0 here. The end result is,
    reuseport_alloc(sk) is called instead of adding sk to the existing
    sk2->sk_reuseport_cb.

    It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
    IPv6 address (!ANY and !MAPPED). Only one of the socket will
    receive packet.

    The fix is to set the implicit sk_ipv6only before calling get_port().
    The original sk_ipv6only has to be saved such that it can be restored
    in case get_port() failed. The situation is similar to the
    inet_reset_saddr(sk) after get_port() has failed.

    Thanks to Calvin Owens who created an easy
    reproduction which leads to a fix.

    Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin KaFai Lau
     
  • [ Upstream commit 4adfa79fc254efb7b0eb3cd58f62c2c3f805f1ba ]

    When we dump the ip6mr mfc entries via proc, we initialize an iterator
    with the table to dump but we don't clear the cache pointer which might
    be initialized from a prior read on the same descriptor that ended. This
    can result in lock imbalance (an unnecessary unlock) leading to other
    crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
    Thanks for the reliable reproducer.

    Here's syzbot's trace:
    WARNING: bad unlock balance detected!
    4.15.0-rc3+ #128 Not tainted
    syzkaller971460/3195 is trying to release lock (mrt_lock) at:
    [] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
    but there are no more locks to release!

    other info that might help us debug this:
    1 lock held by syzkaller971460/3195:
    #0: (&p->lock){+.+.}, at: [] seq_read+0xd5/0x13d0
    fs/seq_file.c:165

    stack backtrace:
    CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
    __lock_release kernel/locking/lockdep.c:3775 [inline]
    lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
    __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
    _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
    ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
    traverse+0x3bc/0xa00 fs/seq_file.c:135
    seq_read+0x96a/0x13d0 fs/seq_file.c:189
    proc_reg_read+0xef/0x170 fs/proc/inode.c:217
    do_loop_readv_writev fs/read_write.c:673 [inline]
    do_iter_read+0x3db/0x5b0 fs/read_write.c:897
    compat_readv+0x1bf/0x270 fs/read_write.c:1140
    do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
    C_SYSC_preadv fs/read_write.c:1209 [inline]
    compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
    do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
    do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
    entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
    RIP: 0023:0xf7f73c79
    RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
    RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
    RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    BUG: sleeping function called from invalid context at lib/usercopy.c:25
    in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
    INFO: lockdep is turned off.
    CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
    __might_sleep+0x95/0x190 kernel/sched/core.c:6013
    __might_fault+0xab/0x1d0 mm/memory.c:4525
    _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
    copy_to_user include/linux/uaccess.h:155 [inline]
    seq_read+0xcb4/0x13d0 fs/seq_file.c:279
    proc_reg_read+0xef/0x170 fs/proc/inode.c:217
    do_loop_readv_writev fs/read_write.c:673 [inline]
    do_iter_read+0x3db/0x5b0 fs/read_write.c:897
    compat_readv+0x1bf/0x270 fs/read_write.c:1140
    do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
    C_SYSC_preadv fs/read_write.c:1209 [inline]
    compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
    do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
    do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
    entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
    RIP: 0023:0xf7f73c79
    RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
    RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
    RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
    lib/usercopy.c:26

    Reported-by: syzbot
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     

31 Jan, 2018

5 commits

  • [ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]

    Validate gso_type during segmentation as SKB_GSO_DODGY sources
    may pass packets where the gso_type does not match the contents.

    Syzkaller was able to enter the SCTP gso handler with a packet of
    gso_type SKB_GSO_TCPV4.

    On entry of transport layer gso handlers, verify that the gso_type
    matches the transport protocol.

    Fixes: 90017accff61 ("sctp: Add GSO support")
    Link: http://lkml.kernel.org/r/
    Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]

    Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
    call common GRE functions") moved dev->mtu initialization
    from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
    result, the previously set values, before ndo_init(), are
    reset in the following cases:

    * rtnl_create_link() can update dev->mtu from IFLA_MTU
    parameter.

    * ip6gre_tnl_link_config() is invoked before ndo_init() in
    netlink and ioctl setup, so ndo_init() can reset MTU
    adjustments with the lower device MTU as well, dev->mtu
    and dev->hard_header_len.

    Not applicable for ip6gretap because it has one more call
    to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().

    Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
    parameter if a user sets it manually on a device creation,
    and fix the second one by moving ip6gre_tnl_link_config()
    call after register_netdevice().

    Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
    Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     
  • [ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]

    In my last patch, I missed fact that cork.base.dst was not initialized
    in ip6_make_skb() :

    If ip6_setup_cork() returns an error, we might attempt a dst_release()
    on some random pointer.

    Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 749439bfac6e1a2932c582e2699f91d329658196 ]

    The logic in __ip6_append_data() assumes that the MTU is at least large
    enough for the headers. A device's MTU may be adjusted after being
    added while sendmsg() is processing data, resulting in
    __ip6_append_data() seeing any MTU. For an mtu smaller than the size of
    the fragmentation header, the math results in a negative 'maxfraglen',
    which causes problems when refragmenting any previous skb in the
    skb_write_queue, leaving it possibly malformed.

    Instead sendmsg returns EINVAL when the mtu is calculated to be less
    than IPV6_MIN_MTU.

    Found by syzkaller:
    kernel BUG at ./include/linux/skbuff.h:2064!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
    RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
    RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
    RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
    RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
    RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
    RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
    R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
    R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
    FS: 00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    ip6_finish_skb include/net/ipv6.h:911 [inline]
    udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
    udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
    inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    SYSC_sendto+0x352/0x5a0 net/socket.c:1750
    SyS_sendto+0x40/0x50 net/socket.c:1718
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x4512e9
    RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
    RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
    RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
    R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
    R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
    Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
    RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
    RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570

    Reported-by: syzbot
    Signed-off-by: Mike Maloney
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Mike Maloney
     
  • [ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]

    Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
    sysctl setting") removed the initialisation of
    ipv6_pinfo::autoflowlabel and added a second flag to indicate
    whether this field or the net namespace default should be used.

    The getsockopt() handling for this case was not updated, so it
    currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
    not explicitly enabled. Fix it to return the effective value, whether
    that has been set at the socket or net namespace level.

    Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
    Signed-off-by: Ben Hutchings
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     

17 Jan, 2018

2 commits

  • [ Upstream commit 862c03ee1deb7e19e0f9931682e0294ecd1fcaf9 ]

    ip6_setup_cork() might return an error, while memory allocations have
    been done and must be rolled back.

    Fixes: 6422398c2ab0 ("ipv6: introduce ipv6_make_skb")
    Signed-off-by: Eric Dumazet
    Cc: Vlad Yasevich
    Reported-by: Mike Maloney
    Acked-by: Mike Maloney
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 23263ec86a5f44312d2899323872468752324107 ]

    When an ip6_tunnel is in mode 'any', where the transport layer
    protocol can be either 4 or 41, dst_cache must be disabled.

    This is because xfrm policies might apply to only one of the two
    protocols. Caching dst would cause xfrm policies for one protocol
    incorrectly used for the other.

    Signed-off-by: Eli Cooper
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eli Cooper
     

03 Jan, 2018

4 commits

  • [ Upstream commit 74c4b656c3d92ec4c824ea1a4afd726b7b6568c8 ]

    commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
    introduced new exit point in ipxip6_rcv. however rcu_read_unlock is
    missing there. this diff is fixing this

    v1->v2:
    instead of doing rcu_read_unlock in place, we are going to "drop"
    section (to prevent skb leakage)

    Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
    Signed-off-by: Nikita V. Shirokov
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikita V. Shirokov
     
  • [ Upstream commit 30791ac41927ebd3e75486f9504b6d2280463bf0 ]

    The MD5-key that belongs to a connection is identified by the peer's
    IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
    to an incoming segment from tcp_check_req() that failed the seq-number
    checks.

    Thus, to find the correct key, we need to use the skb's saddr and not
    the daddr.

    This bug seems to have been there since quite a while, but probably got
    unnoticed because the consequences are not catastrophic. We will call
    tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
    thus the connection doesn't really fail.

    Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
    Signed-off-by: Christoph Paasch
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Christoph Paasch
     
  • [ Upstream commit 513674b5a2c9c7a67501506419da5c3c77ac6f08 ]

    sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
    If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
    supposed to not include flowlabel. This is true for normal packet, but
    not for reset packet.

    The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
    we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
    changed, so the sock will keep the old behavior in terms of auto
    flowlabel. Reset packet is suffering from this problem, because reset
    packet is sent from a special control socket, which is created at boot
    time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
    socket will always have its ipv6_pinfo.autoflowlabel set, even after
    user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
    have flowlabel. Normal sock created before sysctl setting suffers from
    the same issue. We can't even turn off autoflowlabel unless we kill all
    socks in the hosts.

    To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
    autoflowlabel setting from user, otherwise we always call
    ip6_default_np_autolabel() which has the new settings of sysctl.

    Note, this changes behavior a little bit. Before commit 42240901f7c4
    (ipv6: Implement different admin modes for automatic flow labels), the
    autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
    existing connection will change autoflowlabel behavior. After that
    commit, autoflowlabel behavior is sticky in the whole life of the sock.
    With this patch, the behavior isn't sticky again.

    Cc: Martin KaFai Lau
    Cc: Eric Dumazet
    Cc: Tom Herbert
    Signed-off-by: Shaohua Li
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Shaohua Li
     
  • [ Upstream commit b9b312a7a451e9c098921856e7cfbc201120e1a7 ]

    syzkaller reported crashes in IPv6 stack [1]

    Xin Long found that lo MTU was set to silly values.

    IPv6 stack reacts to changes to small MTU, by disabling itself under
    RTNL.

    But there is a window where threads not using RTNL can see a wrong
    device mtu. This can lead to surprises, in mld code where it is assumed
    the mtu is suitable.

    Fix this by reading device mtu once and checking IPv6 minimal MTU.

    [1]
    skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20
    head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo
    ------------[ cut here ]------------
    kernel BUG at net/core/skbuff.c:104!
    invalid opcode: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
    RSP: 0018:ffff8801db307508 EFLAGS: 00010286
    RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000
    RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95
    RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020
    R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540
    FS: 0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:

    skb_over_panic net/core/skbuff.c:109 [inline]
    skb_put+0x181/0x1c0 net/core/skbuff.c:1694
    add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
    add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
    mld_send_cr net/ipv6/mcast.c:1903 [inline]
    mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
    call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
    expire_timers kernel/time/timer.c:1357 [inline]
    __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
    run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
    __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
    invoke_softirq kernel/softirq.c:365 [inline]
    irq_exit+0x1d3/0x210 kernel/softirq.c:405
    exiting_irq arch/x86/include/asm/apic.h:540 [inline]
    smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
    apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Tested-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     

25 Dec, 2017

1 commit

  • [ Upstream commit 1f372c7bfb23286d2bf4ce0423ab488e86b74bb2 ]

    The NS for DAD are sent on admin up as long as a valid qdisc is found.
    A race condition exists by which these packets will not egress the
    interface if the operational state of the lower device is not yet up.
    The solution is to delay DAD until the link is operationally up
    according to RFC2863. Rather than only doing this, follow the existing
    code checks by deferring IPv6 device initialization altogether. The fix
    allows DAD on devices like tunnels that are controlled by userspace
    control plane. The fix has no impact on regular deployments, but means
    that there is no IPv6 connectivity until the port has been opened in
    the case of port-based network access control, which should be
    desirable.

    Signed-off-by: Mike Manning
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Mike Manning
     

16 Dec, 2017

1 commit

  • [ Upstream commit f859b4af1c52493ec21173ccc73d0b60029b5b88 ]

    After parsing the sit netlink change info, we forget to update frag_off in
    ipip6_tunnel_update(). Fix it by assigning frag_off with new value.

    Reported-by: Jianlin Shi
    Signed-off-by: Hangbin Liu
    Acked-by: Nicolas Dichtel
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Hangbin Liu
     

14 Dec, 2017

3 commits

  • [ Upstream commit 981542c526ecd846920bc500e9989da906ee9fb9 ]

    After commit 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call
    common GRE functions") it's not used anywhere in the module, but
    previously was used in ip6gre_rcv().

    Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     
  • [ Upstream commit 15e668070a64bb97f102ad9cf3bccbca0545cda8 ]

    Andrey reported the following kernel crash:

    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff88001f311700 task.stack: ffff88001f6e8000
    RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
    RSP: 0018:ffff88001f6ef418 EFLAGS: 00010202
    RAX: dffffc0000000000 RBX: 1ffff10003edde8c RCX: ffffc900043ee000
    RDX: 0000000000000004 RSI: ffffffff83e3b3f8 RDI: 0000000000000020
    RBP: ffff88001f6ef508 R08: fffffbfff0dcc5d8 R09: 0000000000000000
    R10: ffffffff86e62ec0 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff88001f6ef4e0 R15: ffff8800380a0040
    FS: 00007f7a52cec700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000061c500 CR3: 000000001f1ae000 CR4: 00000000000006f0
    DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Call Trace:
    rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
    inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
    inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
    sock_release+0x8d/0x1e0 net/socket.c:597
    __sock_create+0x39d/0x880 net/socket.c:1226
    sock_create_kern+0x3f/0x50 net/socket.c:1243
    inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
    icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
    ops_init+0x10a/0x550 net/core/net_namespace.c:115
    setup_net+0x261/0x660 net/core/net_namespace.c:291
    copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
    9pnet_virtio: no channels available for device ./file1
    create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
    unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
    SYSC_unshare kernel/fork.c:2281 [inline]
    SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
    entry_SYSCALL_64_fastpath+0x1f/0xc2

    This is because net->ipv6.mr6_tables is not initialized at that point,
    ip6mr_rules_init() is not called yet, therefore on the error path when
    we iterator the list, we trigger this oops. Fix this by reordering
    ip6mr_rules_init() before icmpv6_sk_init().

    Reported-by: Andrey Konovalov
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     
  • [ Upstream commit e3dc847a5f85b43ee2bfc8eae407a7e383483228 ]

    In vti6_xmit(), the check for IPV6_MIN_MTU before we
    send a ICMPV6_PKT_TOOBIG message is missing. So we might
    report a PMTU below 1280. Fix this by adding the required
    check.

    Fixes: ccd740cbc6e ("vti6: Add pmtu handling to vti6_xmit.")
    Signed-off-by: Steffen Klassert
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Steffen Klassert
     

10 Dec, 2017

1 commit

  • [ Upstream commit 93e246f783e6bd1bc64fdfbfe68b18161f69b28e ]

    vti6 interface is registered before the rtnl_link_ops block
    is attached. As a result the resulting RTM_NEWLINK is missing
    IFLA_INFO_KIND. Re-order attachment of rtnl_link_ops block to fix.

    Signed-off-by: Dave Forster
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Forster
     

30 Nov, 2017

2 commits

  • [ Upstream commit 7bb387c5ab12aeac3d5eea28686489ff46b53ca9 ]

    IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index
    does not match it. e.g.,

    ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument

    Relax the check in setsockopt to allow setting mc_index to an L3 slave if
    sk_bound_dev_if points to an L3 master.

    Make a similar change for IPv6. In this case change the device lookup to
    take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for
    the lookup of a potential L3 master device.

    This really only silences a setsockopt failure since uses of mc_index are
    secondary to sk_bound_dev_if if it is set. In both cases, if either index
    is an L3 slave or master, lookups are directed to the same FIB table so
    relaxing the check at setsockopt time causes no harm.

    Patch is based on a suggested change by Darwin for a problem noted in
    their code base.

    Suggested-by: Darwin Dingel
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • commit 76da0704507bbc51875013f6557877ab308cfd0a upstream.

    In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
    I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
    unfortunately, as reported by jeffy, netdev_wait_allrefs()
    could rebroadcast NETDEV_UNREGISTER event until all refs are
    gone.

    We have to add an additional check to avoid this corner case.
    For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
    for dev_change_net_namespace(), dev->reg_state is
    NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.

    Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
    Reported-by: jeffy
    Cc: David Ahern
    Signed-off-by: Cong Wang
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Cc: Konstantin Khlebnikov
    Signed-off-by: Greg Kroah-Hartman

    WANG Cong
     

18 Nov, 2017

5 commits

  • [ Upstream commit 8aec4959d832bae0889a8e2f348973b5e4abffef ]

    When receiving a Toobig icmpv6 packet, ip6gre_err would just set
    tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
    still be using the old value, it has no chance to be updated with
    tunnel dev's mtu.

    Jianlin found this issue by reducing route's mtu while running
    netperf, the performance went to 0.

    ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
    the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
    to upper socket after setting tunnel dev's mtu.

    We couldn't do that for ip6_gre, as gre's inner packet could be
    any protocol, it's difficult to handle them (like lookup upper
    dst) in a good way.

    So this patch is to fix it by updating skb_dst(skb)'s pmtu when
    dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
    update there, as usually dev->mtu
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit f8d20b46ce55cf40afb30dcef6d9288f7ef46d9b ]

    The similar fix in patch 'ipip: only increase err_count for some
    certain type icmp in ipip_err' is needed for ip6gre_err.

    In Jianlin's case, udp netperf broke even when receiving a TooBig
    icmpv6 packet.

    Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
    Reported-by: Jianlin Shi
    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 864e2a1f8aac05effac6063ce316b480facb46ff ]

    When syzkaller team brought us a C repro for the crash [1] that
    had been reported many times in the past, I finally could find
    the root cause.

    If FlowLabel info is merged by fl6_merge_options(), we leave
    part of the opt_space storage provided by udp/raw/l2tp with random value
    in opt_space.tot_len, unless a control message was provided at sendmsg()
    time.

    Then ip6_setup_cork() would use this random value to perform a kzalloc()
    call. Undefined behavior and crashes.

    Fix is to properly set tot_len in fl6_merge_options()

    At the same time, we can also avoid consuming memory and cpu cycles
    to clear it, if every option is copied via a kmemdup(). This is the
    change in ip6_setup_cork().

    [1]
    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] SMP KASAN
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in:
    CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    task: ffff8801cb64a100 task.stack: ffff8801cc350000
    RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
    RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
    RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
    RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
    RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
    R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
    R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
    FS: 00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
    DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
    Call Trace:
    ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
    udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
    inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    SYSC_sendto+0x358/0x5a0 net/socket.c:1750
    SyS_sendto+0x40/0x50 net/socket.c:1718
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x4520a9
    RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
    RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
    RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
    R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
    R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
    Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
    RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550

    Signed-off-by: Eric Dumazet
    Reported-by: Dmitry Vyukov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit e669b86945478b3d90d2d87e3793a6eed06d332f ]

    In the (unlikely) event fixup_permanent_addr() returns a failure,
    addrconf_permanent_addr() calls ipv6_del_addr() without the
    mandatory call to in6_ifa_hold(), leading to a refcount error,
    spotted by syzkaller :

    WARNING: CPU: 1 PID: 3142 at lib/refcount.c:227 refcount_dec+0x4c/0x50
    lib/refcount.c:227
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 1 PID: 3142 Comm: ip Not tainted 4.14.0-rc4-next-20171009+ #33
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:16 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:52
    panic+0x1e4/0x41c kernel/panic.c:181
    __warn+0x1c4/0x1e0 kernel/panic.c:544
    report_bug+0x211/0x2d0 lib/bug.c:183
    fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
    do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
    do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
    do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
    invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
    RIP: 0010:refcount_dec+0x4c/0x50 lib/refcount.c:227
    RSP: 0018:ffff8801ca49e680 EFLAGS: 00010286
    RAX: 000000000000002c RBX: ffff8801d07cfcdc RCX: 0000000000000000
    RDX: 000000000000002c RSI: 1ffff10039493c90 RDI: ffffed0039493cc4
    RBP: ffff8801ca49e688 R08: ffff8801ca49dd70 R09: 0000000000000000
    R10: ffff8801ca49df58 R11: 0000000000000000 R12: 1ffff10039493cd9
    R13: ffff8801ca49e6e8 R14: ffff8801ca49e7e8 R15: ffff8801d07cfcdc
    __in6_ifa_put include/net/addrconf.h:369 [inline]
    ipv6_del_addr+0x42b/0xb60 net/ipv6/addrconf.c:1208
    addrconf_permanent_addr net/ipv6/addrconf.c:3327 [inline]
    addrconf_notify+0x1c66/0x2190 net/ipv6/addrconf.c:3393
    notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1697
    call_netdevice_notifiers net/core/dev.c:1715 [inline]
    __dev_notify_flags+0x15d/0x430 net/core/dev.c:6843
    dev_change_flags+0xf5/0x140 net/core/dev.c:6879
    do_setlink+0xa1b/0x38e0 net/core/rtnetlink.c:2113
    rtnl_newlink+0xf0d/0x1a40 net/core/rtnetlink.c:2661
    rtnetlink_rcv_msg+0x733/0x1090 net/core/rtnetlink.c:4301
    netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2408
    rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4313
    netlink_unicast_kernel net/netlink/af_netlink.c:1273 [inline]
    netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1299
    netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1862
    sock_sendmsg_nosec net/socket.c:633 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:643
    ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2049
    __sys_sendmsg+0xe5/0x210 net/socket.c:2083
    SYSC_sendmsg net/socket.c:2094 [inline]
    SyS_sendmsg+0x2d/0x50 net/socket.c:2090
    entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x7fa9174d3320
    RSP: 002b:00007ffe302ae9e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007ffe302b2ae0 RCX: 00007fa9174d3320
    RDX: 0000000000000000 RSI: 00007ffe302aea20 RDI: 0000000000000016
    RBP: 0000000000000082 R08: 0000000000000000 R09: 000000000000000f
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe302b32a0
    R13: 0000000000000000 R14: 00007ffe302b2ab8 R15: 00007ffe302b32b8

    Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
    Signed-off-by: Eric Dumazet
    Cc: David Ahern
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 3d0241d57c7b25bb75ac9d7a62753642264fdbce ]

    When gso_size reset to zero for the tail segment in skb_segment(), later
    in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment()
    we will get incorrect results (payload length, pcsum) for that segment.
    inet_gso_segment() already has a check for gso_size before calculating
    payload.

    The issue was found with LTP vxlan & gre tests over ixgbe NIC.

    Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
    Signed-off-by: Alexey Kodanev
    Acked-by: Alexander Duyck
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     

12 Oct, 2017

3 commits

  • [ Upstream commit d41bb33ba33b8f8debe54ed36be6925eb496e354 ]

    Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
    device, like ip6gre_tap tunnel, for which it should also subtract ether
    header to get the correct mtu.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 2d40557cc702ed8e5edd9bd422233f86652d932e ]

    The patch 'ip_gre: ipgre_tap device should keep dst' fixed
    a issue that ipgre_tap mtu couldn't be updated in tx path.

    The same fix is needed for ip6gre_tap as well.

    Signed-off-by: Xin Long
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 36f6ee22d2d66046e369757ec6bbe1c482957ba6 ]

    When running LTP IPsec tests, KASan might report:

    BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
    Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
    ...
    Call Trace:

    dump_stack+0x63/0x89
    print_address_description+0x7c/0x290
    kasan_report+0x28d/0x370
    ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
    __asan_report_load4_noabort+0x19/0x20
    vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
    ? vti_init_net+0x190/0x190 [ip_vti]
    ? save_stack_trace+0x1b/0x20
    ? save_stack+0x46/0xd0
    dev_hard_start_xmit+0x147/0x510
    ? icmp_echo.part.24+0x1f0/0x210
    __dev_queue_xmit+0x1394/0x1c60
    ...
    Freed by task 0:
    save_stack_trace+0x1b/0x20
    save_stack+0x46/0xd0
    kasan_slab_free+0x70/0xc0
    kmem_cache_free+0x81/0x1e0
    kfree_skbmem+0xb1/0xe0
    kfree_skb+0x75/0x170
    kfree_skb_list+0x3e/0x60
    __dev_queue_xmit+0x1298/0x1c60
    dev_queue_xmit+0x10/0x20
    neigh_resolve_output+0x3a8/0x740
    ip_finish_output2+0x5c0/0xe70
    ip_finish_output+0x4ba/0x680
    ip_output+0x1c1/0x3a0
    xfrm_output_resume+0xc65/0x13d0
    xfrm_output+0x1e4/0x380
    xfrm4_output_finish+0x5c/0x70

    Can be fixed if we get skb->len before dst_output().

    Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code")
    Fixes: 22e1b23dafa8 ("vti6: Support inter address family tunneling.")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev