16 Oct, 2019

6 commits

  • When a application sends many packets with the same txtime, they may
    be transmitted out of order (different from the order in which they
    were enqueued).

    This happens because when inserting elements into the tree, when the
    txtime of two packets are the same, the new packet is inserted at the
    left side of the tree, causing the reordering. The only effect of this
    change should be that packets with the same txtime will be transmitted
    in the order they are enqueued.

    The application in question (the AVTP GStreamer plugin, still in
    development) is sending video traffic, in which each video frame have
    a single presentation time, the problem is that when packetizing,
    multiple packets end up with the same txtime.

    The receiving side was rejecting packets because they were being
    received out of order.

    Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc")
    Reported-by: Ederson de Souza
    Signed-off-by: Vinicius Costa Gomes
    Signed-off-by: David S. Miller

    Vinicius Costa Gomes
     
  • tc_ctl_action() has the ability to loop forever if tcf_action_add()
    returns -EAGAIN.

    This special case has been done in case a module needed to be loaded,
    but it turns out that tcf_add_notify() could also return -EAGAIN
    if the socket sk_rcvbuf limit is hit.

    We need to separate the two cases, and only loop for the module
    loading case.

    While we are at it, add a limit of 10 attempts since unbounded
    loops are always scary.

    syzbot repro was something like :

    socket(PF_NETLINK, SOCK_RAW|SOCK_NONBLOCK, NETLINK_ROUTE) = 3
    write(3, ..., 38) = 38
    setsockopt(3, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
    sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{..., 388}], msg_controllen=0, msg_flags=0x10}, ...)

    NMI backtrace for cpu 0
    CPU: 0 PID: 1054 Comm: khungtaskd Not tainted 5.4.0-rc1+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
    nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
    arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
    trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
    check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
    watchdog+0x9d0/0xef0 kernel/hung_task.c:289
    kthread+0x361/0x430 kernel/kthread.c:255
    ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 8859 Comm: syz-executor910 Not tainted 5.4.0-rc1+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:751 [inline]
    RIP: 0010:lockdep_hardirqs_off+0x1df/0x2e0 kernel/locking/lockdep.c:3453
    Code: 5c 08 00 00 5b 41 5c 41 5d 5d c3 48 c7 c0 58 1d f3 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d3 00 00 00 83 3d 21 9e 99 07 00 0f 84 b9 00 00 00 9c 58 0f 1f 44 00 00 f6
    RSP: 0018:ffff8880a6f3f1b8 EFLAGS: 00000046
    RAX: 1ffffffff11e63ab RBX: ffff88808c9c6080 RCX: 0000000000000000
    RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88808c9c6914
    RBP: ffff8880a6f3f1d0 R08: ffff88808c9c6080 R09: fffffbfff16be5d1
    R10: fffffbfff16be5d0 R11: 0000000000000003 R12: ffffffff8746591f
    R13: ffff88808c9c6080 R14: ffffffff8746591f R15: 0000000000000003
    FS: 00000000011e4880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffff600400 CR3: 00000000a8920000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    trace_hardirqs_off+0x62/0x240 kernel/trace/trace_preemptirq.c:45
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
    _raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
    __wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122
    __wake_up+0xe/0x10 kernel/sched/wait.c:142
    netlink_unlock_table net/netlink/af_netlink.c:466 [inline]
    netlink_unlock_table net/netlink/af_netlink.c:463 [inline]
    netlink_broadcast_filtered+0x705/0xb80 net/netlink/af_netlink.c:1514
    netlink_broadcast+0x3a/0x50 net/netlink/af_netlink.c:1534
    rtnetlink_send+0xdd/0x110 net/core/rtnetlink.c:714
    tcf_add_notify net/sched/act_api.c:1343 [inline]
    tcf_action_add+0x243/0x370 net/sched/act_api.c:1362
    tc_ctl_action+0x3b5/0x4bc net/sched/act_api.c:1410
    rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5386
    netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
    rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5404
    netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
    netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
    sock_sendmsg_nosec net/socket.c:637 [inline]
    sock_sendmsg+0xd7/0x130 net/socket.c:657
    ___sys_sendmsg+0x803/0x920 net/socket.c:2311
    __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
    __do_sys_sendmsg net/socket.c:2365 [inline]
    __se_sys_sendmsg net/socket.c:2363 [inline]
    __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
    do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x440939

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot+cf0adbb9c28c8866c788@syzkaller.appspotmail.com
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • syzbot found that if __inet_inherit_port() returns an error,
    we call tcp_done() after inet_csk_prepare_forced_close(),
    meaning the socket lock is no longer held.

    We might fix this in a different way in net-next, but
    for 5.4 it seems safer to relax the lockdep check.

    Fixes: d983ea6f16b8 ("tcp: add rcu protection around tp->fastopen_rsk")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • the following script:

    # tc qdisc add dev eth0 clsact
    # tc filter add dev eth0 egress protocol ip matchall \
    > action mpls push protocol mpls_uc label 0x355aa bos 1

    causes corruption of all IP packets transmitted by eth0. On TC egress, we
    can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push'
    operation will result in an overwrite of the first 4 octets in the packet
    L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same
    error pattern is present also in the MPLS 'pop' operation. Fix this error
    in act_mpls data plane, computing 'mac_len' as the difference between the
    network header and the mac header (when not at TC ingress), and use it in
    MPLS 'push'/'pop' core functions.

    v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len'
    in skb_mpls_pop(), reported by kbuild test robot

    CC: Lorenzo Bianconi
    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Reviewed-by: Simon Horman
    Acked-by: John Hurley
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • the following script:

    # tc qdisc add dev eth0 clsact
    # tc filter add dev eth0 egress matchall action mpls pop

    implicitly makes the kernel drop all packets transmitted by eth0, if they
    don't have a MPLS header. This behavior is uncommon: other encapsulations
    (like VLAN) just let the packet pass unmodified. Since the result of MPLS
    'pop' operation would be the same regardless of the presence / absence of
    MPLS header(s) in the original packet, we can let skb_mpls_pop() return 0
    when dealing with non-MPLS packets.

    For the OVS use-case, this is acceptable because __ovs_nla_copy_actions()
    already ensures that MPLS 'pop' operation only occurs with packets having
    an MPLS Ethernet type (and there are no other callers in current code, so
    the semantic change should be ok).

    v2: better documentation of use-cases for skb_mpls_pop(), thanks to Simon
    Horman

    Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC")
    Reviewed-by: Simon Horman
    Acked-by: John Hurley
    Signed-off-by: Davide Caratti
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • While invalidating the dst, we assign backhole_netdev instead of
    loopback device. However, this device does not have idev pointer
    and hence no ip6_ptr even if IPv6 is enabled. Possibly this has
    triggered the syzbot reported crash.

    The syzbot report does not have reproducer, however, this is the
    only device that doesn't have matching idev created.

    Crash instruction is :

    static inline bool ip6_ignore_linkdown(const struct net_device *dev)
    {
    const struct inet6_dev *idev = __in6_dev_get(dev);

    return !!idev->cnf.ignore_routes_with_linkdown;

    Mahesh Bandewar
     

14 Oct, 2019

9 commits

  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_wmem_queued while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    sk_wmem_queued_add() helper is added so that we can in
    the future convert to ADD_ONCE() or equivalent if/when
    available.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_sndbuf while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    Note that other transports probably need similar fixes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • For the sake of tcp_poll(), there are few places where we fetch
    sk->sk_rcvbuf while this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make sure write
    sides use corresponding WRITE_ONCE() to avoid store-tearing.

    Note that other transports probably need similar fixes.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There two places where we fetch tp->urg_seq while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write side use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->snd_nxt while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->write_seq while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->copied_seq while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Note that tcp_inq_hint() was already using READ_ONCE(tp->copied_seq)

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • There are few places where we fetch tp->rcv_nxt while
    this field can change from IRQ or other cpu.

    We need to add READ_ONCE() annotations, and also make
    sure write sides use corresponding WRITE_ONCE() to avoid
    store-tearing.

    Note that tcp_inq_hint() was already using READ_ONCE(tp->rcv_nxt)

    syzbot reported :

    BUG: KCSAN: data-race in tcp_poll / tcp_queue_rcv

    write to 0xffff888120425770 of 4 bytes by interrupt on cpu 0:
    tcp_rcv_nxt_update net/ipv4/tcp_input.c:3365 [inline]
    tcp_queue_rcv+0x180/0x380 net/ipv4/tcp_input.c:4638
    tcp_rcv_established+0xbf1/0xf50 net/ipv4/tcp_input.c:5616
    tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
    tcp_v4_rcv+0x1a03/0x1bf0 net/ipv4/tcp_ipv4.c:1923
    ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
    ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
    dst_input include/net/dst.h:442 [inline]
    ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
    napi_skb_finish net/core/dev.c:5671 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061

    read to 0xffff888120425770 of 4 bytes by task 7254 on cpu 1:
    tcp_stream_is_readable net/ipv4/tcp.c:480 [inline]
    tcp_poll+0x204/0x6b0 net/ipv4/tcp.c:554
    sock_poll+0xed/0x250 net/socket.c:1256
    vfs_poll include/linux/poll.h:90 [inline]
    ep_item_poll.isra.0+0x90/0x190 fs/eventpoll.c:892
    ep_send_events_proc+0x113/0x5c0 fs/eventpoll.c:1749
    ep_scan_ready_list.constprop.0+0x189/0x500 fs/eventpoll.c:704
    ep_send_events fs/eventpoll.c:1793 [inline]
    ep_poll+0xe3/0x900 fs/eventpoll.c:1930
    do_epoll_wait+0x162/0x180 fs/eventpoll.c:2294
    __do_sys_epoll_pwait fs/eventpoll.c:2325 [inline]
    __se_sys_epoll_pwait fs/eventpoll.c:2311 [inline]
    __x64_sys_epoll_pwait+0xcd/0x170 fs/eventpoll.c:2311
    do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 7254 Comm: syz-fuzzer Not tainted 5.3.0+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Both tcp_v4_err() and tcp_v6_err() do the following operations
    while they do not own the socket lock :

    fastopen = tp->fastopen_rsk;
    snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;

    The problem is that without appropriate barrier, the compiler
    might reload tp->fastopen_rsk and trigger a NULL deref.

    request sockets are protected by RCU, we can simply add
    the missing annotations and barriers to solve the issue.

    Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

13 Oct, 2019

1 commit


12 Oct, 2019

1 commit

  • If an ICMP packet comes in on the UDP socket backing an AF_RXRPC socket as
    the UDP socket is being shut down, rxrpc_error_report() may get called to
    deal with it after sk_user_data on the UDP socket has been cleared, leading
    to a NULL pointer access when this local endpoint record gets accessed.

    Fix this by just returning immediately if sk_user_data was NULL.

    The oops looks like the following:

    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    ...
    RIP: 0010:rxrpc_error_report+0x1bd/0x6a9
    ...
    Call Trace:
    ? sock_queue_err_skb+0xbd/0xde
    ? __udp4_lib_err+0x313/0x34d
    __udp4_lib_err+0x313/0x34d
    icmp_unreach+0x1ee/0x207
    icmp_rcv+0x25b/0x28f
    ip_protocol_deliver_rcu+0x95/0x10e
    ip_local_deliver+0xe9/0x148
    __netif_receive_skb_one_core+0x52/0x6e
    process_backlog+0xdc/0x177
    net_rx_action+0xf9/0x270
    __do_softirq+0x1b6/0x39a
    ? smpboot_register_percpu_thread+0xce/0xce
    run_ksoftirqd+0x1d/0x42
    smpboot_thread_fn+0x19e/0x1b3
    kthread+0xf1/0xf6
    ? kthread_delayed_work_timer_fn+0x83/0x83
    ret_from_fork+0x24/0x30

    Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
    Reported-by: syzbot+611164843bd48cc2190c@syzkaller.appspotmail.com
    Signed-off-by: David Howells
    Signed-off-by: David S. Miller

    David Howells
     

11 Oct, 2019

3 commits

  • smc_rx_recvmsg() first checks if data is available, and then if
    RCV_SHUTDOWN is set. There is a race when smc_cdc_msg_recv_action() runs
    in between these 2 checks, receives data and sets RCV_SHUTDOWN.
    In that case smc_rx_recvmsg() would return from receive without to
    process the available data.
    Fix that with a final check for data available if RCV_SHUTDOWN is set.
    Move the check for data into a function and call it twice.
    And use the existing helper smc_rx_data_available().

    Fixes: 952310ccf2d8 ("smc: receive data from RMBE")
    Reviewed-by: Ursula Braun
    Signed-off-by: Karsten Graul
    Signed-off-by: Jakub Kicinski

    Karsten Graul
     
  • smc_cdc_rxed_any_close_or_senddone() is used as an end condition for the
    receive loop. This conflicts with smc_cdc_msg_recv_action() which could
    run in parallel and set the bits checked by
    smc_cdc_rxed_any_close_or_senddone() before the receive is processed.
    In that case we could return from receive with no data, although data is
    available. The same applies to smc_rx_wait().
    Fix this by checking for RCV_SHUTDOWN only, which is set in
    smc_cdc_msg_recv_action() after the receive was actually processed.

    Fixes: 952310ccf2d8 ("smc: receive data from RMBE")
    Reviewed-by: Ursula Braun
    Signed-off-by: Karsten Graul
    Signed-off-by: Jakub Kicinski

    Karsten Graul
     
  • If creation of an SMCD link group with VLAN id fails, the initial
    smc_ism_get_vlan() step has to be reverted as well.

    Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM")
    Signed-off-by: Ursula Braun
    Signed-off-by: Karsten Graul
    Signed-off-by: Jakub Kicinski

    Ursula Braun
     

10 Oct, 2019

11 commits

  • sk->sk_backlog.len can be written by BH handlers, and read
    from process contexts in a lockless way.

    Note the write side should also use WRITE_ONCE() or a variant.
    We need some agreement about the best way to do this.

    syzbot reported :

    BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0

    write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1:
    sk_add_backlog include/net/sock.h:934 [inline]
    tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737
    tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
    ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
    ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
    dst_input include/net/dst.h:442 [inline]
    ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
    napi_skb_finish net/core/dev.c:5671 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
    virtnet_receive drivers/net/virtio_net.c:1323 [inline]
    virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
    napi_poll net/core/dev.c:6352 [inline]
    net_rx_action+0x3ae/0xa50 net/core/dev.c:6418

    read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0:
    tcp_space include/net/tcp.h:1373 [inline]
    tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413
    tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717
    tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618
    tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542
    sk_backlog_rcv include/net/sock.h:945 [inline]
    __release_sock+0x135/0x1e0 net/core/sock.c:2427
    release_sock+0x61/0x160 net/core/sock.c:2943
    tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181
    inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
    sock_recvmsg_nosec net/socket.c:871 [inline]
    sock_recvmsg net/socket.c:889 [inline]
    sock_recvmsg+0x92/0xb0 net/socket.c:885
    sock_read_iter+0x15f/0x1e0 net/socket.c:967
    call_read_iter include/linux/fs.h:1864 [inline]
    new_sync_read+0x389/0x4f0 fs/read_write.c:414
    __vfs_read+0xb1/0xc0 fs/read_write.c:427
    vfs_read fs/read_write.c:461 [inline]
    vfs_read+0x143/0x2c0 fs/read_write.c:446

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • sock_rcvlowat() or int_sk_rcvlowat() might be called without the socket
    lock for example from tcp_poll().

    Use READ_ONCE() to document the fact that other cpus might change
    sk->sk_rcvlowat under us and avoid KCSAN splats.

    Use WRITE_ONCE() on write sides too.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • sk_add_backlog() callers usually read sk->sk_rcvbuf without
    owning the socket lock. This means sk_rcvbuf value can
    be changed by other cpus, and KCSAN complains.

    Add READ_ONCE() annotations to document the lockless nature
    of these reads.

    Note that writes over sk_rcvbuf should also use WRITE_ONCE(),
    but this will be done in separate patches to ease stable
    backports (if we decide this is relevant for stable trees).

    BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg

    write to 0xffff88812ab369f8 of 8 bytes by interrupt on cpu 1:
    __sk_add_backlog include/net/sock.h:902 [inline]
    sk_add_backlog include/net/sock.h:933 [inline]
    tcp_add_backlog+0x45a/0xcc0 net/ipv4/tcp_ipv4.c:1737
    tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925
    ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204
    ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
    dst_input include/net/dst.h:442 [inline]
    ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
    napi_skb_finish net/core/dev.c:5671 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
    virtnet_receive drivers/net/virtio_net.c:1323 [inline]
    virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
    napi_poll net/core/dev.c:6352 [inline]
    net_rx_action+0x3ae/0xa50 net/core/dev.c:6418

    read to 0xffff88812ab369f8 of 8 bytes by task 7271 on cpu 0:
    tcp_recvmsg+0x470/0x1a30 net/ipv4/tcp.c:2047
    inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
    sock_recvmsg_nosec net/socket.c:871 [inline]
    sock_recvmsg net/socket.c:889 [inline]
    sock_recvmsg+0x92/0xb0 net/socket.c:885
    sock_read_iter+0x15f/0x1e0 net/socket.c:967
    call_read_iter include/linux/fs.h:1864 [inline]
    new_sync_read+0x389/0x4f0 fs/read_write.c:414
    __vfs_read+0xb1/0xc0 fs/read_write.c:427
    vfs_read fs/read_write.c:461 [inline]
    vfs_read+0x143/0x2c0 fs/read_write.c:446
    ksys_read+0xd5/0x1b0 fs/read_write.c:587
    __do_sys_read fs/read_write.c:597 [inline]
    __se_sys_read fs/read_write.c:595 [inline]
    __x64_sys_read+0x4c/0x60 fs/read_write.c:595
    do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 7271 Comm: syz-fuzzer Not tainted 5.3.0+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • tcp_memory_pressure is read without holding any lock,
    and its value could be changed on other cpus.

    Use READ_ONCE() to annotate these lockless reads.

    The write side is already using atomic ops.

    Fixes: b8da51ebb1aa ("tcp: introduce tcp_under_memory_pressure()")
    Signed-off-by: Eric Dumazet
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • reqsk_queue_empty() is called from inet_csk_listen_poll() while
    other cpus might write ->rskq_accept_head value.

    Use {READ|WRITE}_ONCE() to avoid compiler tricks
    and potential KCSAN splats.

    Fixes: fff1f3001cc5 ("tcp: add a spinlock to protect struct request_sock_queue")
    Signed-off-by: Eric Dumazet
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • As mentioned in https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
    a C compiler can legally transform :

    if (memory_pressure && *memory_pressure)
    *memory_pressure = 0;

    to :

    if (memory_pressure)
    *memory_pressure = 0;

    Fixes: 0604475119de ("tcp: add TCPMemoryPressuresChrono counter")
    Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.")
    Fixes: 3ab224be6d69 ("[NET] CORE: Introducing new memory accounting interface.")
    Signed-off-by: Eric Dumazet
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • As hinted by KCSAN, we need at least one READ_ONCE()
    to prevent a compiler optimization.

    More details on :
    https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance

    sysbot report :
    BUG: KCSAN: data-race in __nf_ct_refresh_acct / __nf_ct_refresh_acct

    read to 0xffff888123eb4f08 of 4 bytes by interrupt on cpu 0:
    __nf_ct_refresh_acct+0xd4/0x1b0 net/netfilter/nf_conntrack_core.c:1796
    nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
    nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
    nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
    nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
    ipv4_conntrack_in+0x27/0x40 net/netfilter/nf_conntrack_proto.c:178
    nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
    nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
    nf_hook include/linux/netfilter.h:260 [inline]
    NF_HOOK include/linux/netfilter.h:303 [inline]
    ip_rcv+0x12f/0x1a0 net/ipv4/ip_input.c:523
    __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004
    __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118
    netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208
    napi_skb_finish net/core/dev.c:5671 [inline]
    napi_gro_receive+0x28f/0x330 net/core/dev.c:5704
    receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
    virtnet_receive drivers/net/virtio_net.c:1323 [inline]
    virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
    napi_poll net/core/dev.c:6352 [inline]
    net_rx_action+0x3ae/0xa50 net/core/dev.c:6418
    __do_softirq+0x115/0x33f kernel/softirq.c:292

    write to 0xffff888123eb4f08 of 4 bytes by task 7191 on cpu 1:
    __nf_ct_refresh_acct+0xfb/0x1b0 net/netfilter/nf_conntrack_core.c:1797
    nf_ct_refresh_acct include/net/netfilter/nf_conntrack.h:201 [inline]
    nf_conntrack_tcp_packet+0xd40/0x3390 net/netfilter/nf_conntrack_proto_tcp.c:1161
    nf_conntrack_handle_packet net/netfilter/nf_conntrack_core.c:1633 [inline]
    nf_conntrack_in+0x410/0xaa0 net/netfilter/nf_conntrack_core.c:1727
    ipv4_conntrack_local+0xbe/0x130 net/netfilter/nf_conntrack_proto.c:200
    nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline]
    nf_hook_slow+0x83/0x160 net/netfilter/core.c:512
    nf_hook include/linux/netfilter.h:260 [inline]
    __ip_local_out+0x1f7/0x2b0 net/ipv4/ip_output.c:114
    ip_local_out+0x31/0x90 net/ipv4/ip_output.c:123
    __ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
    ip_queue_xmit+0x45/0x60 include/net/ip.h:236
    __tcp_transmit_skb+0xdeb/0x1cd0 net/ipv4/tcp_output.c:1158
    __tcp_send_ack+0x246/0x300 net/ipv4/tcp_output.c:3685
    tcp_send_ack+0x34/0x40 net/ipv4/tcp_output.c:3691
    tcp_cleanup_rbuf+0x130/0x360 net/ipv4/tcp.c:1575

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 7191 Comm: syz-fuzzer Not tainted 5.3.0+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

    Fixes: cc16921351d8 ("netfilter: conntrack: avoid same-timeout update")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Jozsef Kadlecsik
    Cc: Florian Westphal
    Acked-by: Pablo Neira Ayuso
    Signed-off-by: Jakub Kicinski

    Eric Dumazet
     
  • The flag NLM_F_ECHO aims to reply to the user the message notified to all
    listeners.
    It was not the case with the command RTM_NEWNSID, let's fix this.

    Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
    Reported-by: Guillaume Nault
    Signed-off-by: Nicolas Dichtel
    Acked-by: Guillaume Nault
    Tested-by: Guillaume Nault
    Signed-off-by: Jakub Kicinski

    Nicolas Dichtel
     
  • If tcf_register_action failed, mirred_device_notifier
    should be unregistered.

    Fixes: 3b87956ea645 ("net sched: fix race in mirred device removal")
    Signed-off-by: YueHaibing
    Signed-off-by: Jakub Kicinski

    YueHaibing
     
  • When configuring a taprio instance if "flags" is not specified (or
    it's zero), taprio currently replies with an "Invalid argument" error.

    So, set the return value to zero after we are done with all the
    checks.

    Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
    Signed-off-by: Vinicius Costa Gomes
    Acked-by: Vladimir Oltean
    Signed-off-by: Jakub Kicinski

    Vinicius Costa Gomes
     
  • This patch is to fix a NULL-ptr deref in selinux_socket_connect_helper:

    [...] kasan: GPF could be caused by NULL-ptr deref or user memory access
    [...] RIP: 0010:selinux_socket_connect_helper+0x94/0x460
    [...] Call Trace:
    [...] selinux_sctp_bind_connect+0x16a/0x1d0
    [...] security_sctp_bind_connect+0x58/0x90
    [...] sctp_process_asconf+0xa52/0xfd0 [sctp]
    [...] sctp_sf_do_asconf+0x785/0x980 [sctp]
    [...] sctp_do_sm+0x175/0x5a0 [sctp]
    [...] sctp_assoc_bh_rcv+0x285/0x5b0 [sctp]
    [...] sctp_backlog_rcv+0x482/0x910 [sctp]
    [...] __release_sock+0x11e/0x310
    [...] release_sock+0x4f/0x180
    [...] sctp_accept+0x3f9/0x5a0 [sctp]
    [...] inet_accept+0xe7/0x720

    It was caused by that the 'newsk' sk_socket was not set before going to
    security sctp hook when processing asconf chunk with SCTP_PARAM_ADD_IP
    or SCTP_PARAM_SET_PRIMARY:

    inet_accept()->
    sctp_accept():
    lock_sock():
    lock listening 'sk'
    do_softirq():
    sctp_rcv(): sk_socket can be NULL when the sock is closed, so SOCK_DEAD
    flag is also needed to check in sctp_newsk_ready().

    Thanks to Ondrej for reviewing the code.

    Fixes: d452930fd3b9 ("selinux: Add SCTP support")
    Reported-by: Ying Xu
    Suggested-by: Marcelo Ricardo Leitner
    Signed-off-by: Xin Long
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: Jakub Kicinski

    Xin Long
     

09 Oct, 2019

8 commits

  • ip6erspan driver calls ether_setup(), after commit 61e84623ace3
    ("net: centralize net_device min/max MTU checking"), the range
    of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.

    It causes the dev mtu of the erspan device to not be greater
    than 1500, this limit value is not correct for ip6erspan tap
    device.

    Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking")
    Signed-off-by: Haishuang Yan
    Acked-by: William Tu
    Signed-off-by: Jakub Kicinski

    Haishuang Yan
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    A number of fixes:
    * allow scanning when operating on radar channels in
    ETSI regdomains
    * accept deauth frames in IBSS - we have code to parse
    and handle them, but were dropping them early
    * fix an allocation failure path in hwsim
    * fix a failure path memory leak in nl80211 FTM code
    * fix RCU handling & locking in multi-BSSID parsing
    * reject malformed SSID in mac80211 (this shouldn't
    really be able to happen, but defense in depth)
    * avoid userspace buffer overrun in ancient wext code
    if SSID was too long
    ====================

    Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

    Jakub Kicinski
     
  • For TCA_ACT_KIND, we have to keep the backward compatibility too,
    and rely on nla_strlcpy() to check and terminate the string with
    a NUL.

    Note for TC actions, nla_strcmp() is already used to compare kind
    strings, so we don't need to fix other places.

    Fixes: 199ce850ce11 ("net_sched: add policy validation for action attributes")
    Reported-by: Marcelo Ricardo Leitner
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Jakub Kicinski

    Cong Wang
     
  • Marcelo noticed a backward compatibility issue of TCA_KIND
    after we move from NLA_STRING to NLA_NUL_STRING, so it is probably
    too late to change it.

    Instead, to make everyone happy, we can just insert a NUL to
    terminate the string with nla_strlcpy() like we do for TC actions.

    Fixes: 62794fc4fbf5 ("net_sched: add max len check for TCA_KIND")
    Reported-by: Marcelo Ricardo Leitner
    Cc: Jamal Hadi Salim
    Cc: Jiri Pirko
    Signed-off-by: Cong Wang
    Reviewed-by: Marcelo Ricardo Leitner
    Signed-off-by: Jakub Kicinski

    Cong Wang
     
  • If llc_conn_state_process() sees that llc_conn_service() put the skb on
    a list, it will drop one fewer references to it. This is wrong because
    the current behavior is that llc_conn_service() never consumes a
    reference to the skb.

    The code also makes the number of skb references being dropped
    conditional on which of ind_prim and cfm_prim are nonzero, yet neither
    of these affects how many references are *acquired*. So there is extra
    code that tries to fix this up by sometimes taking another reference.

    Remove the unnecessary/broken refcounting logic and instead just add an
    skb_get() before the only two places where an extra reference is
    actually consumed.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Biggers
    Signed-off-by: Jakub Kicinski

    Eric Biggers
     
  • All callers of llc_conn_state_process() except llc_build_and_send_pkt()
    (via llc_ui_sendmsg() -> llc_ui_send_data()) assume that it always
    consumes a reference to the skb. Fix this caller to do the same.

    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Biggers
    Signed-off-by: Jakub Kicinski

    Eric Biggers
     
  • syzbot reported:

    BUG: memory leak
    unreferenced object 0xffff88811eb3de00 (size 224):
    comm "syz-executor559", pid 7315, jiffies 4294943019 (age 10.300s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 a0 38 24 81 88 ff ff 00 c0 f2 15 81 88 ff ff ..8$............
    backtrace:
    [] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
    [] slab_post_alloc_hook mm/slab.h:439 [inline]
    [] slab_alloc_node mm/slab.c:3269 [inline]
    [] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
    [] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
    [] alloc_skb include/linux/skbuff.h:1058 [inline]
    [] llc_alloc_frame+0x66/0x110 net/llc/llc_sap.c:54
    [] llc_conn_ac_send_sabme_cmd_p_set_x+0x2f/0x140 net/llc/llc_c_ac.c:777
    [] llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
    [] llc_conn_service net/llc/llc_conn.c:400 [inline]
    [] llc_conn_state_process+0x1ac/0x640 net/llc/llc_conn.c:75
    [] llc_establish_connection+0x110/0x170 net/llc/llc_if.c:109
    [] llc_ui_connect+0x10e/0x370 net/llc/af_llc.c:477
    [] __sys_connect+0x11d/0x170 net/socket.c:1840
    [...]

    The bug is that most callers of llc_conn_send_pdu() assume it consumes a
    reference to the skb, when actually due to commit b85ab56c3f81 ("llc:
    properly handle dev_queue_xmit() return value") it doesn't.

    Revert most of that commit, and instead make the few places that need
    llc_conn_send_pdu() to *not* consume a reference call skb_get() before.

    Fixes: b85ab56c3f81 ("llc: properly handle dev_queue_xmit() return value")
    Reported-by: syzbot+6b825a6494a04cc0e3f7@syzkaller.appspotmail.com
    Signed-off-by: Eric Biggers
    Signed-off-by: Jakub Kicinski

    Eric Biggers
     
  • syzbot reported:

    BUG: memory leak
    unreferenced object 0xffff888116270800 (size 224):
    comm "syz-executor641", pid 7047, jiffies 4294947360 (age 13.860s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 20 e1 2a 81 88 ff ff 00 40 3d 2a 81 88 ff ff . .*.....@=*....
    backtrace:
    [] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
    [] slab_post_alloc_hook mm/slab.h:439 [inline]
    [] slab_alloc_node mm/slab.c:3269 [inline]
    [] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
    [] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
    [] alloc_skb include/linux/skbuff.h:1058 [inline]
    [] alloc_skb_with_frags+0x5f/0x250 net/core/skbuff.c:5327
    [] sock_alloc_send_pskb+0x269/0x2a0 net/core/sock.c:2225
    [] sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2242
    [] llc_ui_sendmsg+0x10a/0x540 net/llc/af_llc.c:933
    [] sock_sendmsg_nosec net/socket.c:652 [inline]
    [] sock_sendmsg+0x54/0x70 net/socket.c:671
    [] __sys_sendto+0x148/0x1f0 net/socket.c:1964
    [...]

    The bug is that llc_sap_state_process() always takes an extra reference
    to the skb, but sometimes neither llc_sap_next_state() nor
    llc_sap_state_process() itself drops this reference.

    Fix it by changing llc_sap_next_state() to never consume a reference to
    the skb, rather than sometimes do so and sometimes not. Then remove the
    extra skb_get() and kfree_skb() from llc_sap_state_process().

    Reported-by: syzbot+6bf095f9becf5efef645@syzkaller.appspotmail.com
    Reported-by: syzbot+31c16aa4202dace3812e@syzkaller.appspotmail.com
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Eric Biggers
    Signed-off-by: Jakub Kicinski

    Eric Biggers
     

08 Oct, 2019

1 commit

  • In non-ETSI regulatory domains scan is blocked when operating channel
    is a DFS channel. For ETSI, however, once DFS channel is marked as
    available after the CAC, this channel will remain available (for some
    time) even after leaving this channel.

    Therefore a scan can be done without any impact on the availability
    of the DFS channel as no new CAC is required after the scan.

    Enable scan in mac80211 in these cases.

    Signed-off-by: Aaron Komisar
    Link: https://lore.kernel.org/r/1570024728-17284-1-git-send-email-aaron.komisar@tandemg.com
    Signed-off-by: Johannes Berg

    Aaron Komisar