13 Feb, 2018

8 commits

  • [ Upstream commit 4db428a7c9ab07e08783e0fcdc4ca0f555da0567 ]

    reuseport_add_sock() needs to deal with attaching a socket having
    its own sk_reuseport_cb, after a prior
    setsockopt(SO_ATTACH_REUSEPORT_?BPF)

    Without this fix, not only a WARN_ONCE() was issued, but we were also
    leaking memory.

    Thanks to sysbot and Eric Biggers for providing us nice C repros.

    ------------[ cut here ]------------
    socket already in reuseport group
    WARNING: CPU: 0 PID: 3496 at net/core/sock_reuseport.c:119  
    reuseport_add_sock+0x742/0x9b0 net/core/sock_reuseport.c:117
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 0 PID: 3496 Comm: syzkaller869503 Not tainted 4.15.0-rc6+ #245
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS  
    Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:17 [inline]
      dump_stack+0x194/0x257 lib/dump_stack.c:53
      panic+0x1e4/0x41c kernel/panic.c:183
      __warn+0x1dc/0x200 kernel/panic.c:547
      report_bug+0x211/0x2d0 lib/bug.c:184
      fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
      fixup_bug arch/x86/kernel/traps.c:247 [inline]
      do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
      do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
      invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079

    Fixes: ef456144da8e ("soreuseport: define reuseport groups")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot+c0ea2226f77a42936bf7@syzkaller.appspotmail.com
    Acked-by: Craig Gallek

    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 7ece54a60ee2ba7a386308cae73c790bd580589c ]

    If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
    implicitly implies it is an ipv6only socket. However, in inet6_bind(),
    this addr_type checking and setting sk->sk_ipv6only to 1 are only done
    after sk->sk_prot->get_port(sk, snum) has been completed successfully.

    This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
    the 'get_port()'.

    In particular, when binding SO_REUSEPORT UDP sockets,
    udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock()
    checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
    sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be
    1 while ipv6_only_sock(sk) is still 0 here. The end result is,
    reuseport_alloc(sk) is called instead of adding sk to the existing
    sk2->sk_reuseport_cb.

    It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
    IPv6 address (!ANY and !MAPPED). Only one of the socket will
    receive packet.

    The fix is to set the implicit sk_ipv6only before calling get_port().
    The original sk_ipv6only has to be saved such that it can be restored
    in case get_port() failed. The situation is similar to the
    inet_reset_saddr(sk) after get_port() has failed.

    Thanks to Calvin Owens who created an easy
    reproduction which leads to a fix.

    Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
    Signed-off-by: Martin KaFai Lau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Martin KaFai Lau
     
  • [ Upstream commit 3aff3b4b986e51bcf4ab249e5d48d39596e0df6a ]

    This commit fixes the pacing_gain to remain at BBR_UNIT (1.0) when
    using lt_bw and returning from the PROBE_RTT state to PROBE_BW.

    Previously, when using lt_bw, upon exiting PROBE_RTT and entering
    PROBE_BW the bbr_reset_probe_bw_mode() code could sometimes randomly
    end up with a cycle_idx of 0 and hence have bbr_advance_cycle_phase()
    set a pacing gain above 1.0. In such cases this would result in a
    pacing rate that is 1.25x higher than intended, potentially resulting
    in a high loss rate for a little while until we stop using the lt_bw a
    bit later.

    This commit is a stable candidate for kernels back as far as 4.9.

    Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
    Signed-off-by: Neal Cardwell
    Signed-off-by: Yuchung Cheng
    Signed-off-by: Soheil Hassas Yeganeh
    Reported-by: Beyers Cronje
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Neal Cardwell
     
  • [ Upstream commit c76fe2d98c726224a975a0d0198c3fb50406d325 ]

    Unsolicited IPv6 neighbor advertisements should be sent after DAD
    completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic
    addresses and have those sent by addrconf_dad_completed after DAD.

    Fixes: 4a6e3c5def13c ("net: ipv6: send unsolicited NA on admin up")
    Reported-by: Vivek Venkatraman
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit edbe69ef2c90fc86998a74b08319a01c508bd497 ]

    This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
    defer call to mem_cgroup_sk_alloc()").

    Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
    memcg socket memory accounting, as packets received before memcg
    pointer initialization are not accounted and are causing refcounting
    underflow on socket release.

    Actually the free-after-use problem was fixed by
    commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
    sk_clone_lock()") for the cgroup pointer.

    So, let's revert it and call mem_cgroup_sk_alloc() just before
    cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
    we're cloning, and it holds a reference to the memcg.

    Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
    mem_cgroup_sk_alloc(). I see no reasons why bumping the root
    memcg counter is a good reason to panic, and there are no realistic
    ways to hit it.

    Signed-off-by: Roman Gushchin
    Cc: Eric Dumazet
    Cc: David S. Miller
    Cc: Johannes Weiner
    Cc: Tejun Heo
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Roman Gushchin
     
  • [ Upstream commit 9b42d55a66d388e4dd5550107df051a9637564fc ]

    socket can be disconnected and gets transformed back to a listening
    socket, if sk_frag.page is not released, which will be cloned into
    a new socket by sk_clone_lock, but the reference count of this page
    is increased, lead to a use after free or double free issue

    Signed-off-by: Li RongQing
    Cc: Eric Dumazet
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Li RongQing
     
  • [ Upstream commit e7aadb27a5415e8125834b84a74477bfbee4eff5 ]

    Newly added igmpv3_get_srcaddr() needs to be called under rcu lock.

    Timer callbacks do not ensure this locking.

    =============================
    WARNING: suspicious RCU usage
    4.15.0+ #200 Not tainted
    -----------------------------
    ./include/linux/inetdevice.h:216 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 2, debug_locks = 1
    3 locks held by syzkaller616973/4074:
    #0: (&mm->mmap_sem){++++}, at: [] __do_page_fault+0x32d/0xc90 arch/x86/mm/fault.c:1355
    #1: ((&im->timer)){+.-.}, at: [] lockdep_copy_map include/linux/lockdep.h:178 [inline]
    #1: ((&im->timer)){+.-.}, at: [] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1316
    #2: (&(&im->lock)->rlock){+.-.}, at: [] spin_lock_bh include/linux/spinlock.h:315 [inline]
    #2: (&(&im->lock)->rlock){+.-.}, at: [] igmpv3_send_report+0x98/0x5b0 net/ipv4/igmp.c:600

    stack backtrace:
    CPU: 0 PID: 4074 Comm: syzkaller616973 Not tainted 4.15.0+ #200
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4592
    __in_dev_get_rcu include/linux/inetdevice.h:216 [inline]
    igmpv3_get_srcaddr net/ipv4/igmp.c:329 [inline]
    igmpv3_newpack+0xeef/0x12e0 net/ipv4/igmp.c:389
    add_grhead.isra.27+0x235/0x300 net/ipv4/igmp.c:432
    add_grec+0xbd3/0x1170 net/ipv4/igmp.c:565
    igmpv3_send_report+0xd5/0x5b0 net/ipv4/igmp.c:605
    igmp_send_report+0xc43/0x1050 net/ipv4/igmp.c:722
    igmp_timer_expire+0x322/0x5c0 net/ipv4/igmp.c:831
    call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
    expire_timers kernel/time/timer.c:1363 [inline]
    __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
    run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
    __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
    invoke_softirq kernel/softirq.c:365 [inline]
    irq_exit+0x1cc/0x200 kernel/softirq.c:405
    exiting_irq arch/x86/include/asm/apic.h:541 [inline]
    smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
    apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:938

    Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot

    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 4adfa79fc254efb7b0eb3cd58f62c2c3f805f1ba ]

    When we dump the ip6mr mfc entries via proc, we initialize an iterator
    with the table to dump but we don't clear the cache pointer which might
    be initialized from a prior read on the same descriptor that ended. This
    can result in lock imbalance (an unnecessary unlock) leading to other
    crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
    Thanks for the reliable reproducer.

    Here's syzbot's trace:
    WARNING: bad unlock balance detected!
    4.15.0-rc3+ #128 Not tainted
    syzkaller971460/3195 is trying to release lock (mrt_lock) at:
    [] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
    but there are no more locks to release!

    other info that might help us debug this:
    1 lock held by syzkaller971460/3195:
    #0: (&p->lock){+.+.}, at: [] seq_read+0xd5/0x13d0
    fs/seq_file.c:165

    stack backtrace:
    CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
    __lock_release kernel/locking/lockdep.c:3775 [inline]
    lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
    __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
    _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
    ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
    traverse+0x3bc/0xa00 fs/seq_file.c:135
    seq_read+0x96a/0x13d0 fs/seq_file.c:189
    proc_reg_read+0xef/0x170 fs/proc/inode.c:217
    do_loop_readv_writev fs/read_write.c:673 [inline]
    do_iter_read+0x3db/0x5b0 fs/read_write.c:897
    compat_readv+0x1bf/0x270 fs/read_write.c:1140
    do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
    C_SYSC_preadv fs/read_write.c:1209 [inline]
    compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
    do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
    do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
    entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
    RIP: 0023:0xf7f73c79
    RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
    RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
    RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    BUG: sleeping function called from invalid context at lib/usercopy.c:25
    in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
    INFO: lockdep is turned off.
    CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
    __might_sleep+0x95/0x190 kernel/sched/core.c:6013
    __might_fault+0xab/0x1d0 mm/memory.c:4525
    _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
    copy_to_user include/linux/uaccess.h:155 [inline]
    seq_read+0xcb4/0x13d0 fs/seq_file.c:279
    proc_reg_read+0xef/0x170 fs/proc/inode.c:217
    do_loop_readv_writev fs/read_write.c:673 [inline]
    do_iter_read+0x3db/0x5b0 fs/read_write.c:897
    compat_readv+0x1bf/0x270 fs/read_write.c:1140
    do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
    C_SYSC_preadv fs/read_write.c:1209 [inline]
    compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
    do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
    do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
    entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
    RIP: 0023:0xf7f73c79
    RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
    RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
    RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
    lib/usercopy.c:26

    Reported-by: syzbot
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Nikolay Aleksandrov
     

08 Feb, 2018

1 commit

  • commit 259d8c1e984318497c84eef547bbb6b1d9f4eb05

    Wireless drivers rely on parse_txq_params to validate that txq_params->ac
    is less than NL80211_NUM_ACS by the time the low-level driver's ->conf_tx()
    handler is called. Use a new helper, array_index_nospec(), to sanitize
    txq_params->ac with respect to speculation. I.e. ensure that any
    speculation into ->conf_tx() handlers is done with a value of
    txq_params->ac that is within the bounds of [0, NL80211_NUM_ACS).

    Reported-by: Christian Lamparter
    Reported-by: Elena Reshetova
    Signed-off-by: Dan Williams
    Signed-off-by: Thomas Gleixner
    Acked-by: Johannes Berg
    Cc: linux-arch@vger.kernel.org
    Cc: kernel-hardening@lists.openwall.com
    Cc: gregkh@linuxfoundation.org
    Cc: linux-wireless@vger.kernel.org
    Cc: torvalds@linux-foundation.org
    Cc: "David S. Miller"
    Cc: alan@linux.intel.com
    Link: https://lkml.kernel.org/r/151727419584.33451.7700736761686184303.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     

04 Feb, 2018

10 commits

  • [ Upstream commit 4ba161a793d5f43757c35feff258d9f20a082940 ]

    Reported-by: Dmitry Vyukov
    Signed-off-by: Trond Myklebust
    Tested-by: Dmitry Vyukov
    Signed-off-by: Anna Schumaker
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • [ Upstream commit 52a395896a051a3d5c34fba67c324f69ec5e67c6 ]

    When doing asoc reset, if the sender of the response has already sent some
    chunk and increased asoc->next_tsn before the duplicate request comes, the
    response will use the old result with an incorrect sender next_tsn.

    Better than asoc->next_tsn, asoc->ctsn_ack_point can't be changed after
    the sender of the response has performed the asoc reset and before the
    peer has confirmed it, and it's value is still asoc->next_tsn original
    value minus 1.

    This patch sets sender next_tsn for the old result with ctsn_ack_point
    plus 1 when processing the duplicate request, to make sure the sender
    next_tsn value peer gets will be always right.

    Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 159f2a7456c6ae95c1e1a58e8b8ec65ef12d51cf ]

    Now when doing asoc reset, it cleans up sacked and abandoned queues
    by calling sctp_outq_free where it also cleans up unsent, retransmit
    and transmitted queues.

    It's safe for the sender of response, as these 3 queues are empty at
    that time. But when the receiver of response is doing the reset, the
    users may already enqueue some chunks into unsent during the time
    waiting the response, and these chunks should not be flushed.

    To void the chunks in it would be removed, it moves the queue into a
    temp list, then gets it back after sctp_outq_free is done.

    The patch also fixes some incorrect comments in
    sctp_process_strreset_tsnreq.

    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 5c6144a0eb5366ae07fc5059301b139338f39bbd ]

    As it says in rfc6525#section5.1.4, before sending the request,

    C2: The sender has either no outstanding TSNs or considers all
    outstanding TSNs abandoned.

    Prior to this patch, it tried to consider all outstanding TSNs abandoned
    by dropping all chunks in all outqs with sctp_outq_free (even including
    sacked, retransmit and transmitted queues) when doing this reset, which
    is too aggressive.

    To make it work gently, this patch will only allow the asoc reset when
    the sender has no outstanding TSNs by checking if unsent, transmitted
    and retransmit are all empty with sctp_outq_is_empty before sending
    and processing the request.

    Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter")
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit fbbdad5edf0bb59786a51b94a9d006bc8c2da9a2 ]

    The previous path metric update from RANN frame has not considered
    the own link metric toward the transmitting mesh STA. Fix this.

    Reported-by: Michael65535
    Signed-off-by: Chun-Yeow Yeoh
    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Chun-Yeow Yeoh
     
  • [ Upstream commit 7b6ddeaf27eca72795ceeae2f0f347db1b5f9a30 ]

    When connected to a QoS/WMM AP, mac80211 should use a QoS NDP
    for probing it, instead of a regular non-QoS one, fix this.

    Change all the drivers to *not* allow QoS NDP for now, even
    though it looks like most of them should be OK with that.

    Signed-off-by: Johannes Berg
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • [ Upstream commit 67c8d22a73128ff910e2287567132530abcf5b71 ]

    If we want to add a datapath flow, which has more than 500 vxlan outputs'
    action, we will get the following error reports:
    openvswitch: netlink: Flow action size 32832 bytes exceeds max
    openvswitch: netlink: Flow action size 32832 bytes exceeds max
    openvswitch: netlink: Actions may not be safe on all matching packets
    ... ...

    It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
    this is not the root cause. For example, for a vxlan output action, we need
    about 60 bytes for the nlattr, but after it is converted to the flow
    action, it only occupies 24 bytes. This means that we can still support
    more than 1000 vxlan output actions for a single datapath flow under the
    the current 32k max limitation.

    So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
    shouldn't report EINVAL and keep it move on, as the judgement can be
    done by the reserve_sfa_size.

    Signed-off-by: zhangliping
    Acked-by: Pravin B Shelar
    Signed-off-by: David S. Miller
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    zhangliping
     
  • [ Upstream commit f859ab61875978eeaa539740ff7f7d91f5d60006 ]

    RxRPC service endpoints expire like they're supposed to by the following
    means:

    (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
    global service conn timeout, otherwise the first rxrpc_net struct to
    die will cause connections on all others to expire immediately from
    then on.

    (2) Mark local service endpoints for which the socket has been closed
    (->service_closed) so that the expiration timeout can be much
    shortened for service and client connections going through that
    endpoint.

    (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
    count reaches 1, not 0, as idle conns have a 1 count.

    (4) The accumulator for the earliest time we might want to schedule for
    should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
    the comparison functions use signed arithmetic.

    (5) Simplify the expiration handling, adding the expiration value to the
    idle timestamp each time rather than keeping track of the time in the
    past before which the idle timestamp must go to be expired. This is
    much easier to read.

    (6) Ignore the timeouts if the net namespace is dead.

    (7) Restart the service reaper work item rather the client reaper.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 9faaff593404a9c4e5abc6839a641635d7b9d0cd ]

    Provide a different lockdep key for rxrpc_call::user_mutex when the call is
    made on a kernel socket, such as by the AFS filesystem.

    The problem is that lockdep registers a false positive between userspace
    calling the sendmsg syscall on a user socket where call->user_mutex is held
    whilst userspace memory is accessed whereas the AFS filesystem may perform
    operations with mmap_sem held by the caller.

    In such a case, the following warning is produced.

    ======================================================
    WARNING: possible circular locking dependency detected
    4.14.0-fscache+ #243 Tainted: G E
    ------------------------------------------------------
    modpost/16701 is trying to acquire lock:
    (&vnode->io_lock){+.+.}, at: [] afs_begin_vnode_operation+0x33/0x77 [kafs]

    but task is already holding lock:
    (&mm->mmap_sem){++++}, at: [] __do_page_fault+0x1ef/0x486

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #3 (&mm->mmap_sem){++++}:
    __might_fault+0x61/0x89
    _copy_from_iter_full+0x40/0x1fa
    rxrpc_send_data+0x8dc/0xff3
    rxrpc_do_sendmsg+0x62f/0x6a1
    rxrpc_sendmsg+0x166/0x1b7
    sock_sendmsg+0x2d/0x39
    ___sys_sendmsg+0x1ad/0x22b
    __sys_sendmsg+0x41/0x62
    do_syscall_64+0x89/0x1be
    return_from_SYSCALL_64+0x0/0x75

    -> #2 (&call->user_mutex){+.+.}:
    __mutex_lock+0x86/0x7d2
    rxrpc_new_client_call+0x378/0x80e
    rxrpc_kernel_begin_call+0xf3/0x154
    afs_make_call+0x195/0x454 [kafs]
    afs_vl_get_capabilities+0x193/0x198 [kafs]
    afs_vl_lookup_vldb+0x5f/0x151 [kafs]
    afs_create_volume+0x2e/0x2f4 [kafs]
    afs_mount+0x56a/0x8d7 [kafs]
    mount_fs+0x6a/0x109
    vfs_kern_mount+0x67/0x135
    do_mount+0x90b/0xb57
    SyS_mount+0x72/0x98
    do_syscall_64+0x89/0x1be
    return_from_SYSCALL_64+0x0/0x75

    -> #1 (k-sk_lock-AF_RXRPC){+.+.}:
    lock_sock_nested+0x74/0x8a
    rxrpc_kernel_begin_call+0x8a/0x154
    afs_make_call+0x195/0x454 [kafs]
    afs_fs_get_capabilities+0x17a/0x17f [kafs]
    afs_probe_fileserver+0xf7/0x2f0 [kafs]
    afs_select_fileserver+0x83f/0x903 [kafs]
    afs_fetch_status+0x89/0x11d [kafs]
    afs_iget+0x16f/0x4f8 [kafs]
    afs_mount+0x6c6/0x8d7 [kafs]
    mount_fs+0x6a/0x109
    vfs_kern_mount+0x67/0x135
    do_mount+0x90b/0xb57
    SyS_mount+0x72/0x98
    do_syscall_64+0x89/0x1be
    return_from_SYSCALL_64+0x0/0x75

    -> #0 (&vnode->io_lock){+.+.}:
    lock_acquire+0x174/0x19f
    __mutex_lock+0x86/0x7d2
    afs_begin_vnode_operation+0x33/0x77 [kafs]
    afs_fetch_data+0x80/0x12a [kafs]
    afs_readpages+0x314/0x405 [kafs]
    __do_page_cache_readahead+0x203/0x2ba
    filemap_fault+0x179/0x54d
    __do_fault+0x17/0x60
    __handle_mm_fault+0x6d7/0x95c
    handle_mm_fault+0x24e/0x2a3
    __do_page_fault+0x301/0x486
    do_page_fault+0x236/0x259
    page_fault+0x22/0x30
    __clear_user+0x3d/0x60
    padzero+0x1c/0x2b
    load_elf_binary+0x785/0xdc7
    search_binary_handler+0x81/0x1ff
    do_execveat_common.isra.14+0x600/0x888
    do_execve+0x1f/0x21
    SyS_execve+0x28/0x2f
    do_syscall_64+0x89/0x1be
    return_from_SYSCALL_64+0x0/0x75

    other info that might help us debug this:

    Chain exists of:
    &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&mm->mmap_sem);
    lock(&call->user_mutex);
    lock(&mm->mmap_sem);
    lock(&vnode->io_lock);

    *** DEADLOCK ***

    1 lock held by modpost/16701:
    #0: (&mm->mmap_sem){++++}, at: [] __do_page_fault+0x1ef/0x486

    stack backtrace:
    CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ #243
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    Call Trace:
    dump_stack+0x67/0x8e
    print_circular_bug+0x341/0x34f
    check_prev_add+0x11f/0x5d4
    ? add_lock_to_list.isra.12+0x8b/0x8b
    ? add_lock_to_list.isra.12+0x8b/0x8b
    ? __lock_acquire+0xf77/0x10b4
    __lock_acquire+0xf77/0x10b4
    lock_acquire+0x174/0x19f
    ? afs_begin_vnode_operation+0x33/0x77 [kafs]
    __mutex_lock+0x86/0x7d2
    ? afs_begin_vnode_operation+0x33/0x77 [kafs]
    ? afs_begin_vnode_operation+0x33/0x77 [kafs]
    ? afs_begin_vnode_operation+0x33/0x77 [kafs]
    afs_begin_vnode_operation+0x33/0x77 [kafs]
    afs_fetch_data+0x80/0x12a [kafs]
    afs_readpages+0x314/0x405 [kafs]
    __do_page_cache_readahead+0x203/0x2ba
    ? filemap_fault+0x179/0x54d
    filemap_fault+0x179/0x54d
    __do_fault+0x17/0x60
    __handle_mm_fault+0x6d7/0x95c
    handle_mm_fault+0x24e/0x2a3
    __do_page_fault+0x301/0x486
    do_page_fault+0x236/0x259
    page_fault+0x22/0x30
    RIP: 0010:__clear_user+0x3d/0x60
    RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
    RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
    RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
    R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
    padzero+0x1c/0x2b
    load_elf_binary+0x785/0xdc7
    search_binary_handler+0x81/0x1ff
    do_execveat_common.isra.14+0x600/0x888
    do_execve+0x1f/0x21
    SyS_execve+0x28/0x2f
    do_syscall_64+0x89/0x1be
    entry_SYSCALL64_slow_path+0x25/0x25
    RIP: 0033:0x7fdb6009ee07
    RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
    RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
    RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
    RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
    R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
    R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 03a6c82218b9a87014b2c6c4e178294fdc8ebd8a ]

    The caller of rxrpc_accept_call() must release the lock on call->user_mutex
    returned by that function.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

31 Jan, 2018

21 commits

  • [ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]

    due to some JITs doing if (src_reg == 0) check in 64-bit mode
    for div/mod operations mask upper 32-bits of src register
    before doing the check

    Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
    Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
    Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • [ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]

    The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

    A quote from goolge project zero blog:
    "At this point, it would normally be necessary to locate gadgets in
    the host kernel code that can be used to actually leak data by reading
    from an attacker-controlled location, shifting and masking the result
    appropriately and then using the result of that as offset to an
    attacker-controlled address for a load. But piecing gadgets together
    and figuring out which ones work in a speculation context seems annoying.
    So instead, we decided to use the eBPF interpreter, which is built into
    the host kernel - while there is no legitimate way to invoke it from inside
    a VM, the presence of the code in the host kernel's text section is sufficient
    to make it usable for the attack, just like with ordinary ROP gadgets."

    To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
    option that removes interpreter from the kernel in favor of JIT-only mode.
    So far eBPF JIT is supported by:
    x64, arm64, arm32, sparc64, s390, powerpc64, mips64

    The start of JITed program is randomized and code page is marked as read-only.
    In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

    v2->v3:
    - move __bpf_prog_ret0 under ifdef (Daniel)

    v1->v2:
    - fix init order, test_bpf and cBPF (Daniel's feedback)
    - fix offloaded bpf (Jakub's feedback)
    - add 'return 0' dummy in case something can invoke prog->bpf_func
    - retarget bpf tree. For bpf-next the patch would need one extra hunk.
    It will be sent when the trees are merged back to net-next

    Considered doing:
    int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
    but it seems better to land the patch as-is and in bpf-next remove
    bpf_jit_enable global variable from all JITs, consolidate in one place
    and remove this jit_init() function.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Greg Kroah-Hartman

    Alexei Starovoitov
     
  • [ Upstream commit 6503a30440962f1e1ccb8868816b4e18201218d4 ]

    Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
    versions of route lookup") broke "ip route get" in the presence
    of rules that specify iif lo.

    Host-originated traffic always has iif lo, because
    ip_route_output_key_hash and ip6_route_output_flags set the flow
    iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
    convenient way to select only originated traffic and not forwarded
    traffic.

    inet_rtm_getroute used to match these rules correctly because
    even though it sets the flow iif to 0, it called
    ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
    But now that it calls ip_route_output_key_hash_rcu, the ifindex
    will remain 0 and not match the iif lo in the rule. As a result,
    "ip route get" will return ENETUNREACH.

    Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
    Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
    Signed-off-by: Lorenzo Colitti
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Lorenzo Colitti
     
  • [ Upstream commit 6db959c82eb039a151d95a0f8b7dea643657327a ]

    The current code copies directly from userspace to ctx->crypto_send, but
    doesn't always reinitialize it to 0 on failure. This causes any
    subsequent attempt to use this setsockopt to fail because of the
    TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually
    ready.

    This should result in a correctly set up socket after the 3rd call, but
    currently it does not:

    size_t s = sizeof(struct tls12_crypto_info_aes_gcm_128);
    struct tls12_crypto_info_aes_gcm_128 crypto_good = {
    .info.version = TLS_1_2_VERSION,
    .info.cipher_type = TLS_CIPHER_AES_GCM_128,
    };

    struct tls12_crypto_info_aes_gcm_128 crypto_bad_type = crypto_good;
    crypto_bad_type.info.cipher_type = 42;

    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_bad_type, s);
    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s - 1);
    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s);

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit 877d17c79b66466942a836403773276e34fe3614 ]

    do_tls_setsockopt_tx returns 0 without doing anything when crypto_info
    is already set. Silent failure is confusing for users.

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit cf6d43ef66f416282121f436ce1bee9a25199d52 ]

    During setsockopt(SOL_TCP, TLS_TX), if initialization of the software
    context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't
    reassign ctx->priv_ctx to NULL, so we can't even do another attempt to
    set it up on the same socket, as it will fail with -EEXIST.

    Fixes: 3c4d7559159b ('tls: kernel TLS support')
    Signed-off-by: Sabrina Dubroca
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Sabrina Dubroca
     
  • [ Upstream commit d91c3e17f75f218022140dee18cf515292184a8f ]

    Calling accept on a TCP socket with a TLS ulp attached results
    in two sockets that share the same ulp context.
    The ulp context is freed while a socket is destroyed, so
    after one of the sockets is released, the second second will
    trigger a use after free when it tries to access the ulp context
    attached to it.
    We restrict the TLS ulp to sockets in ESTABLISHED state
    to prevent the scenario above.

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com
    Signed-off-by: Ilya Lesokhin
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Ilya Lesokhin
     
  • [ Upstream commit cd443f1e91ca600a092e780e8250cd6a2954b763 ]

    Move up the extack reset/initialization in netlink_rcv_skb, so that
    those 'goto ack' will not skip it. Otherwise, later on netlink_ack
    may use the uninitialized extack and cause kernel crash.

    Fixes: cbbdf8433a5f ("netlink: extack needs to be reset each time through loop")
    Reported-by: syzbot+03bee3680a37466775e7@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit cbbdf8433a5f117b1a2119ea30fc651b61ef7570 ]

    syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
    The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
    the callback and without resetting the extack leaving potentially stale
    data. Initializing each time through avoids the WARN_ON.

    Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
    Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    David Ahern
     
  • [ Upstream commit 625637bf4afa45204bd87e4218645182a919485a ]

    After introducing sctp_stream structure, sctp uses stream->outcnt as the
    out stream nums instead of c.sinit_num_ostreams.

    However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
    in sctp_sendmsg. At that moment, stream->outcnt is still using previous
    value. If it's value is not updated, the sinit_num_ostreams of sinit could
    not really work.

    This patch is to fix it by updating stream->outcnt and reiniting stream
    if stream outcnt has been change by sinit in sendmsg.

    Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf")
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit d0c081b49137cd3200f2023c0875723be66e7ce5 ]

    syzbot reported yet another crash [1] that is caused by
    insufficient validation of DODGY packets.

    Two bugs are happening here to trigger the crash.

    1) Flow dissection leaves with incorrect thoff field.

    2) skb_probe_transport_header() sets transport header to this invalid
    thoff, even if pointing after skb valid data.

    3) qdisc_pkt_len_init() reads out-of-bound data because it
    trusts tcp_hdrlen(skb)

    Possible fixes :

    - Full flow dissector validation before injecting bad DODGY packets in
    the stack.
    This approach was attempted here : https://patchwork.ozlabs.org/patch/
    861874/

    - Have more robust functions in the core.
    This might be needed anyway for stable versions.

    This patch fixes the flow dissection issue.

    [1]
    CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:17 [inline]
    dump_stack+0x194/0x257 lib/dump_stack.c:53
    print_address_description+0x73/0x250 mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:355 [inline]
    kasan_report+0x23b/0x360 mm/kasan/report.c:413
    __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
    __tcp_hdrlen include/linux/tcp.h:35 [inline]
    tcp_hdrlen include/linux/tcp.h:40 [inline]
    qdisc_pkt_len_init net/core/dev.c:3160 [inline]
    __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
    dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
    packet_snd net/packet/af_packet.c:2943 [inline]
    packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
    sock_sendmsg_nosec net/socket.c:628 [inline]
    sock_sendmsg+0xca/0x110 net/socket.c:638
    sock_write_iter+0x31a/0x5d0 net/socket.c:907
    call_write_iter include/linux/fs.h:1776 [inline]
    new_sync_write fs/read_write.c:469 [inline]
    __vfs_write+0x684/0x970 fs/read_write.c:482
    vfs_write+0x189/0x510 fs/read_write.c:544
    SYSC_write fs/read_write.c:589 [inline]
    SyS_write+0xef/0x220 fs/read_write.c:581
    entry_SYSCALL_64_fastpath+0x1f/0x96

    Fixes: 34fad54c2537 ("net: __skb_flow_dissect() must cap its return value")
    Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect")
    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Reported-by: syzbot
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]

    Validate gso_type during segmentation as SKB_GSO_DODGY sources
    may pass packets where the gso_type does not match the contents.

    Syzkaller was able to enter the SCTP gso handler with a packet of
    gso_type SKB_GSO_TCPV4.

    On entry of transport layer gso handlers, verify that the gso_type
    matches the transport protocol.

    Fixes: 90017accff61 ("sctp: Add GSO support")
    Link: http://lkml.kernel.org/r/
    Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
    Signed-off-by: Willem de Bruijn
    Acked-by: Jason Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Willem de Bruijn
     
  • [ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]

    Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
    call common GRE functions") moved dev->mtu initialization
    from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
    result, the previously set values, before ndo_init(), are
    reset in the following cases:

    * rtnl_create_link() can update dev->mtu from IFLA_MTU
    parameter.

    * ip6gre_tnl_link_config() is invoked before ndo_init() in
    netlink and ioctl setup, so ndo_init() can reset MTU
    adjustments with the lower device MTU as well, dev->mtu
    and dev->hard_header_len.

    Not applicable for ip6gretap because it has one more call
    to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().

    Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
    parameter if a user sets it manually on a device creation,
    and fix the second one by moving ip6gre_tnl_link_config()
    call after register_netdevice().

    Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
    Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting")
    Signed-off-by: Alexey Kodanev
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Alexey Kodanev
     
  • [ Upstream commit 59b36613e85fb16ebf9feaf914570879cd5c2a21 ]

    When tipc_node_find_by_name() fails, the nlmsg is not
    freed.

    While on it, switch to a goto label to properly
    free it.

    Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
    Reported-by: Dmitry Vyukov
    Cc: Jon Maloy
    Cc: Ying Xue
    Signed-off-by: Cong Wang
    Acked-by: Ying Xue
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Cong Wang
     
  • [ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]

    After commit cea0cc80a677 ("sctp: use the right sk after waking up from
    wait_buf sleep"), it may change to lock another sk if the asoc has been
    peeled off in sctp_wait_for_sndbuf.

    However, the asoc's new sk could be already closed elsewhere, as it's in
    the sendmsg context of the old sk that can't avoid the new sk's closing.
    If the sk's last one refcnt is held by this asoc, later on after putting
    this asoc, the new sk will be freed, while under it's own lock.

    This patch is to revert that commit, but fix the old issue by returning
    error under the old sk's lock.

    Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
    Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
    Signed-off-by: Xin Long
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]

    The check in sctp_sockaddr_af is not robust enough to forbid binding a
    v4mapped v6 addr on a v4 socket.

    The worse thing is that v4 socket's bind_verify would not convert this
    v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
    socket bound a v6 addr.

    This patch is to fix it by doing the common sa.sa_family check first,
    then AF_INET check for v4mapped v6 addrs.

    Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
    Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
    Acked-by: Neil Horman
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Xin Long
     
  • [ Upstream commit 30be8f8dba1bd2aff73e8447d59228471233a3d4 ]

    sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
    Socket error codes must be inverted by Kernel TLS before returning because
    they are stored with positive sign. If returned non-inverted they are
    interpreted as number of bytes sent, causing endless looping of the
    splice mechanic behind sendfile().

    Signed-off-by: Robert Hering
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    r.hering@avm.de
     
  • [ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

    When a tcp socket is closed, if it detects that its net namespace is
    exiting, close immediately and do not wait for FIN sequence.

    For normal sockets, a reference is taken to their net namespace, so it will
    never exit while the socket is open. However, kernel sockets do not take a
    reference to their net namespace, so it may begin exiting while the kernel
    socket is still open. In this case if the kernel socket is a tcp socket,
    it will stay open trying to complete its close sequence. The sock's dst(s)
    hold a reference to their interface, which are all transferred to the
    namespace's loopback interface when the real interfaces are taken down.
    When the namespace tries to take down its loopback interface, it hangs
    waiting for all references to the loopback interface to release, which
    results in messages like:

    unregister_netdevice: waiting for lo to become free. Usage count = 1

    These messages continue until the socket finally times out and closes.
    Since the net namespace cleanup holds the net_mutex while calling its
    registered pernet callbacks, any new net namespace initialization is
    blocked until the current net namespace finishes exiting.

    After this change, the tcp socket notices the exiting net namespace, and
    closes immediately, releasing its dst(s) and their reference to the
    loopback interface, which lets the net namespace continue exiting.

    Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
    Signed-off-by: Dan Streetman
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Dan Streetman
     
  • [ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]

    Without proper validation of DODGY packets, we might very well
    feed qdisc_pkt_len_init() with invalid GSO packets.

    tcp_hdrlen() might access out-of-bound data, so let's use
    skb_header_pointer() and proper checks.

    Whole story is described in commit d0c081b49137 ("flow_dissector:
    properly cap thoff field")

    We have the goal of validating DODGY packets earlier in the stack,
    so we might very well revert this fix in the future.

    Signed-off-by: Eric Dumazet
    Cc: Willem de Bruijn
    Cc: Jason Wang
    Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
    Acked-by: Jason Wang
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • [ Upstream commit ad23b750933ea7bf962678972a286c78a8fa36aa ]

    Commit "net: igmp: Use correct source address on IGMPv3 reports"
    introduced a check to validate the source address of locally generated
    IGMPv3 packets.
    Instead of checking the local interface address directly, it uses
    inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
    local subnet (or equal to the point-to-point address if used).

    This breaks for point-to-point interfaces, so check against
    ifa->ifa_local directly.

    Cc: Kevin Cernekee
    Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
    Reported-by: Sebastian Gottschall
    Signed-off-by: Felix Fietkau
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Felix Fietkau
     
  • [ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]

    In my last patch, I missed fact that cork.base.dst was not initialized
    in ip6_make_skb() :

    If ip6_setup_cork() returns an error, we might attempt a dst_release()
    on some random pointer.

    Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet