13 Feb, 2019

8 commits

  • Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
    extensions has only 8 bits. Thus extensions starting from DCTCPINFO
    cannot be requested directly. Some of them included into response
    unconditionally or hook into some of lower 8 bits.

    Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

    This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
    reservation, and documents behavior for other extensions.

    Also this patch adds fallback to reporting socket priority. This filed
    is more widely used for traffic classification because ipv4 sockets
    automatically maps TOS to priority and default qdisc pfifo_fast knows
    about that. But priority could be changed via setsockopt SO_PRIORITY so
    INET_DIAG_TOS isn't enough for predicting class.

    Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
    reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

    So, after this patch INET_DIAG_CLASS_ID will report socket priority
    for most common setup when net_cls isn't set and/or cgroup2 in use.

    Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     
  • KMSAN reported batadv_interface_tx() was possibly using a
    garbage value [1]

    batadv_get_vid() does have a pskb_may_pull() call
    but batadv_interface_tx() does not actually make sure
    this did not fail.

    [1]
    BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
    CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
    __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
    batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
    __netdev_start_xmit include/linux/netdevice.h:4356 [inline]
    netdev_start_xmit include/linux/netdevice.h:4365 [inline]
    xmit_one net/core/dev.c:3257 [inline]
    dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
    __dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
    dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
    packet_snd net/packet/af_packet.c:2928 [inline]
    packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    __sys_sendto+0x8c4/0xac0 net/socket.c:1788
    __do_sys_sendto net/socket.c:1800 [inline]
    __se_sys_sendto+0x107/0x130 net/socket.c:1796
    __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x441889
    Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
    RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
    RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
    RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
    R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
    kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
    kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
    kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
    slab_post_alloc_hook mm/slab.h:446 [inline]
    slab_alloc_node mm/slub.c:2759 [inline]
    __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
    __kmalloc_reserve net/core/skbuff.c:137 [inline]
    __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
    alloc_skb include/linux/skbuff.h:998 [inline]
    alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
    sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
    packet_alloc_skb net/packet/af_packet.c:2781 [inline]
    packet_snd net/packet/af_packet.c:2872 [inline]
    packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg net/socket.c:631 [inline]
    __sys_sendto+0x8c4/0xac0 net/socket.c:1788
    __do_sys_sendto net/socket.c:1800 [inline]
    __se_sys_sendto+0x107/0x130 net/socket.c:1796
    __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7

    Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Marek Lindner
    Cc: Simon Wunderlich
    Cc: Antonio Quartulli
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • genlmsg_reply can fail, so propagate its return code

    Fixes: 915d7e5e593 ("ipv6: sr: add code base for control plane support of SR-IPv6")
    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • When an ethernet frame is padded to meet the minimum ethernet frame
    size, the padding octets are not covered by the hardware checksum.
    Fortunately the padding octets are usually zero's, which don't affect
    checksum. However, it is not guaranteed. For example, switches might
    choose to make other use of these octets.
    This repeatedly causes kernel hardware checksum fault.

    Prior to the cited commit below, skb checksum was forced to be
    CHECKSUM_NONE when padding is detected. After it, we need to keep
    skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
    verify and parse IP headers, it does not worth the effort as the packets
    are so small that CHECKSUM_COMPLETE has no significant advantage.

    Future work: when reporting checksum complete is not an option for
    IP non-TCP/UDP packets, we can actually fallback to report checksum
    unnecessary, by looking at cqe IPOK bit.

    Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
    Cc: Eric Dumazet
    Signed-off-by: Saeed Mahameed
    Signed-off-by: Tariq Toukan
    Reviewed-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Saeed Mahameed
     
  • During testing on Armada 388 platforms, it was found with a certain
    module configuration that it was possible to trigger a kernel oops
    during the module load process, caused by the phylink resolver being
    triggered for a currently disabled interface.

    This problem was introduced by changing the way the SFP registration
    works, which now can result in the sfp link down notification being
    called during phylink_create().

    Fixes: b5bfc21af5cb ("net: sfp: do not probe SFP module before we're attached")
    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     
  • Due to the depends on NET_UDP_TUNNEL, at the moment it is impossible to
    compile GENEVE if no other protocol depending on NET_UDP_TUNNEL is
    selected.

    Fix this changing the depends to a select, and drop NET_IP_TUNNEL from the
    select list, as it already depends on NET_UDP_TUNNEL.

    Signed-off-by: Matteo Croce
    Reviewed-and-tested-by: Andrea Claudi
    Tested-by: Davide Caratti
    Signed-off-by: David S. Miller

    Matteo Croce
     
  • The bitmap of found partitions in efx_ef10_mtd_probe was not
    initialised, causing partitions to be suppressed based off whatever
    value was in the bitmap at the start.

    Fixes: 3366463513f5 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe")
    Signed-off-by: Bert Kenward
    Signed-off-by: David S. Miller

    Bert Kenward
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    Just a few fixes:
    * aggregation session teardown with internal TXQs was
    continuing to send some frames marked as aggregation,
    fix from Ilan
    * IBSS join was missed during firmware restart, should
    such a thing happen
    * speculative execution based on the return value of
    cfg80211_classify8021d() - which is controlled by the
    sender of the packet - could be problematic in some
    code using it, prevent it
    * a few peer measurement fixes
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

12 Feb, 2019

9 commits

  • When a link endpoint is re-created (e.g. after a node reboot or
    interface reset), the link session number is varied by random, the peer
    endpoint will be synced with this new session number before the link is
    re-established.

    However, there is a shortcoming in this mechanism that can lead to the
    link never re-established or faced with a failure then. It happens when
    the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
    well as the 'in_session' flag have been set, but suddenly this link
    endpoint leaves. When it comes back with a random session number, there
    are two situations possible:

    1/ If the random session number is larger than (or equal to) the
    previous one, the peer endpoint will be updated with this new session
    upon receipt of a RESET_MSG from this endpoint, and the link can be re-
    established as normal. Otherwise, all the RESET_MSGs from this endpoint
    will be rejected by the peer. In turn, when this link endpoint receives
    one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
    to send STATE_MSGs, but again these messages will be dropped by the
    peer due to wrong session.
    The peer link endpoint can still become ESTABLISHED after receiving a
    traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
    NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
    will be forced down sooner or later!

    Even in case the random session number is larger than the previous one,
    it can be that the ACTIVATE_MSG from the peer arrives first, and this
    link endpoint moves quickly to ESTABLISHED without sending out any
    RESET_MSG yet. Consequently, the peer link will not be updated with the
    new session number, and the same link failure scenario as above will
    happen.

    2/ Another situation can be that, the peer link endpoint was reset due
    to any reasons in the meantime, its link state was set to RESET from
    ESTABLISHING but still in session, i.e. the 'in_session' flag is not
    reset...
    Now, if the random session number from this endpoint is less than the
    previous one, all the RESET_MSGs from this endpoint will be rejected by
    the peer. In the other direction, when this link endpoint receives a
    RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
    ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
    As a result, the link cannot be re-established but gets stuck with this
    link endpoint in state ESTABLISHING and the peer in RESET!

    Solution:

    ===========

    This link endpoint should not go directly to ESTABLISHED when getting
    ACTIVATE_MSG from the peer which may belong to the old session if the
    link was re-created. To ensure the session to be correct before the
    link is re-established, the peer endpoint in ESTABLISHING state will
    send back the last session number in ACTIVATE_MSG for a verification at
    this endpoint. Then, if needed, a new and more appropriate session
    number will be regenerated to force a re-synch first.

    In addition, when a link in ESTABLISHING state is reset, its state will
    move to RESET according to the link FSM, along with resetting the
    'in_session' flag (and the other data) as a normal link reset, it will
    also be deleted if requested.

    The solution is backward compatible.

    Acked-by: Jon Maloy
    Acked-by: Ying Xue
    Signed-off-by: Tuong Lien
    Signed-off-by: David S. Miller

    Tuong Lien
     
  • Follow those steps:
    # ip addr add 2001:123::1/32 dev eth0
    # ip addr add 2001:123:456::2/64 dev eth0
    # ip addr del 2001:123::1/32 dev eth0
    # ip addr del 2001:123:456::2/64 dev eth0
    and then prefix route of 2001:123::1/32 will still exist.

    This is because ipv6_prefix_equal in check_cleanup_prefix_route
    func does not check whether two IPv6 addresses have the same
    prefix length. If the prefix of one address starts with another
    shorter address prefix, even though their prefix lengths are
    different, the return value of ipv6_prefix_equal is true.

    Here I add a check of whether two addresses have the same prefix
    to decide whether their prefixes are equal.

    Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
    Signed-off-by: Zhiqiang Liu
    Reported-by: Wenhao Zhang
    Signed-off-by: David S. Miller

    Zhiqiang Liu
     
  • When we free skb at tipc_data_input, we return a 'false' boolean.
    Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,

    1303 int tipc_link_rcv:
    ...
    1354 if (!tipc_data_input(l, skb, l->inputq))
    1355 rc |= tipc_link_input(l, skb, l->inputq);

    Fix it by simple changing to a 'true' boolean when skb is being free-ed.
    Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
    condition.

    Acked-by: Ying Xue
    Acked-by: Jon Maloy
    Signed-off-by: Hoang Le
    Signed-off-by: David S. Miller

    Hoang Le
     
  • Due to quadratic behavior of x25_new_lci(), syzbot was able
    to trigger an rcu stall.

    Fix this by not blocking BH for the whole duration of
    the function, and inserting a reschedule point when possible.

    If we care enough, using a bitmap could get rid of the quadratic
    behavior.

    syzbot report :

    rcu: INFO: rcu_preempt self-detected stall on CPU
    rcu: 0-...!: (10500 ticks this GP) idle=4fa/1/0x4000000000000002 softirq=283376/283376 fqs=0
    rcu: (t=10501 jiffies g=383105 q=136)
    rcu: rcu_preempt kthread starved for 10502 jiffies! g383105 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
    rcu: RCU grace-period kthread stack dump:
    rcu_preempt I28928 10 2 0x80000000
    Call Trace:
    context_switch kernel/sched/core.c:2844 [inline]
    __schedule+0x817/0x1cc0 kernel/sched/core.c:3485
    schedule+0x92/0x180 kernel/sched/core.c:3529
    schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
    rcu_gp_fqs_loop kernel/rcu/tree.c:1948 [inline]
    rcu_gp_kthread+0x956/0x17a0 kernel/rcu/tree.c:2105
    kthread+0x357/0x430 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
    NMI backtrace for cpu 0
    CPU: 0 PID: 8759 Comm: syz-executor2 Not tainted 5.0.0-rc4+ #51
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
    nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
    arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
    trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
    rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1211
    print_cpu_stall kernel/rcu/tree.c:1348 [inline]
    check_cpu_stall kernel/rcu/tree.c:1422 [inline]
    rcu_pending kernel/rcu/tree.c:3018 [inline]
    rcu_check_callbacks.cold+0x500/0xa4a kernel/rcu/tree.c:2521
    update_process_times+0x32/0x80 kernel/time/timer.c:1635
    tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
    tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
    __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
    __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
    hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
    local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
    smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807

    RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline]
    RIP: 0010:queued_write_lock_slowpath+0x13e/0x290 kernel/locking/qrwlock.c:86
    Code: 00 00 fc ff df 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 0f b6 55 00 41 38 d7 7c eb 84 d2 74 e7 48 89 df e8 6c 0f 4f 00
    RSP: 0018:ffff88805f117bd8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
    RAX: 0000000000000300 RBX: ffffffff89413ba0 RCX: 1ffffffff1282774
    RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89413ba0
    RBP: ffff88805f117c70 R08: 1ffffffff1282774 R09: fffffbfff1282775
    R10: fffffbfff1282774 R11: ffffffff89413ba3 R12: 00000000000000ff
    R13: fffffbfff1282774 R14: 1ffff1100be22f7d R15: 0000000000000003
    queued_write_lock include/asm-generic/qrwlock.h:104 [inline]
    do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203
    __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline]
    _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312
    x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267
    x25_bind+0x273/0x340 net/x25/af_x25.c:705
    __sys_bind+0x23f/0x290 net/socket.c:1505
    __do_sys_bind net/socket.c:1516 [inline]
    __se_sys_bind net/socket.c:1514 [inline]
    __x64_sys_bind+0x73/0xb0 net/socket.c:1514
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457e39
    Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fafccd0dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
    RDX: 0000000000000012 RSI: 0000000020000240 RDI: 0000000000000004
    RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007fafccd0e6d4
    R13: 00000000004bdf8b R14: 00000000004ce4b8 R15: 00000000ffffffff
    Sending NMI from CPU 0 to CPUs 1:
    NMI backtrace for cpu 1
    CPU: 1 PID: 8752 Comm: syz-executor4 Not tainted 5.0.0-rc4+ #51
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:__x25_find_socket+0x78/0x120 net/x25/af_x25.c:328
    Code: 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 a6 00 00 00 4d 8b 64 24 68 4d 85 e4 74 7f e8 03 97 3d fb 49 83 ec 68 74 74 e8 f8 96 3d fb 8d bc 24 88 04 00 00 48 89 f8 48 c1 e8 03 0f b6 04 18 84 c0 74
    RSP: 0018:ffff8880639efc58 EFLAGS: 00000246
    RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000e677000
    RDX: 0000000000040000 RSI: ffffffff863244b8 RDI: ffff88806a764628
    RBP: ffff8880639efc80 R08: ffff8880a80d05c0 R09: fffffbfff1282775
    R10: fffffbfff1282774 R11: ffffffff89413ba3 R12: ffff88806a7645c0
    R13: 0000000000000001 R14: ffff88809f29ac00 R15: 0000000000000000
    FS: 00007fe8d0c58700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000001b32823000 CR3: 00000000672eb000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    x25_new_lci net/x25/af_x25.c:357 [inline]
    x25_connect+0x374/0xdf0 net/x25/af_x25.c:786
    __sys_connect+0x266/0x330 net/socket.c:1686
    __do_sys_connect net/socket.c:1697 [inline]
    __se_sys_connect net/socket.c:1694 [inline]
    __x64_sys_connect+0x73/0xb0 net/socket.c:1694
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457e39
    Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fe8d0c57c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
    RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000004
    RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe8d0c586d4
    R13: 00000000004be378 R14: 00000000004ceb00 R15: 00000000ffffffff

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Andrew Hendry
    Cc: linux-x25@vger.kernel.org
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • netif_rx() must be called under a strict contract.

    At device dismantle phase, core networking clears IFF_UP
    and flush_all_backlogs() is called after rcu grace period
    to make sure no incoming packet might be in a cpu backlog
    and still referencing the device.

    Most drivers call netif_rx() from their interrupt handler,
    and since the interrupts are disabled at device dismantle,
    netif_rx() does not have to check dev->flags & IFF_UP

    Virtual drivers do not have this guarantee, and must
    therefore make the check themselves.

    Otherwise we risk use-after-free and/or crashes.

    Note this patch also fixes a small issue that came
    with commit ce6502a8f957 ("vxlan: fix a use after free
    in vxlan_encap_bypass"), since the dev->stats.rx_dropped
    change was done on the wrong device.

    Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
    Fixes: ce6502a8f957 ("vxlan: fix a use after free in vxlan_encap_bypass")
    Signed-off-by: Eric Dumazet
    Cc: Petr Machata
    Cc: Ido Schimmel
    Cc: Roopa Prabhu
    Cc: Stefano Brivio
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Netlink has moved from bitmasks to group numbers long ago.

    Signed-off-by: Jouke Witteveen
    Signed-off-by: David S. Miller

    Jouke Witteveen
     
  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for net:

    1) Out-of-bound access to packet data from the snmp nat helper,
    from Jann Horn.

    2) ICMP(v6) error packets are set as related traffic by conntrack,
    update protocol number before calling nf_nat_ipv4_manip_pkt()
    to use ICMP(v6) rather than the original protocol number,
    from Florian Westphal.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Sander Eikelenboom bisected a NAT related regression down
    to the l4proto->manip_pkt indirection removal.

    I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
    to the existing conntrack entry.

    Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
    ended up calling the wrong l4 manip function, as tuple->dst.protonum
    is the original flows l4 protocol (TCP, UDP, etc).

    Set the dst protocol field to ICMP(v6), we already have a private copy
    of the tuple due to the inversion of src/dst.

    Reported-by: Sander Eikelenboom
    Tested-by: Sander Eikelenboom
    Fixes: faec18dbb0405 ("netfilter: nat: remove l4proto->manip_pkt")
    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
    will get as much data as they expect; callbacks have to check the `datalen`
    parameter before looking at `data`. Make sure that snmp_version() and
    snmp_helper() don't read/write beyond the end of the packet data.

    (Also move the assignment to `pdata` down below the check to make it clear
    that it isn't necessarily a pointer we can use before the `datalen` check.)

    Fixes: cc2d58634e0f ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
    Signed-off-by: Jann Horn
    Signed-off-by: Pablo Neira Ayuso

    Jann Horn
     

11 Feb, 2019

8 commits

  • When mac80211 requests the low level driver to stop an ongoing
    Tx aggregation, the low level driver is expected to call
    ieee80211_stop_tx_ba_cb_irqsafe() to indicate that it is ready
    to stop the session. The callback in turn schedules a worker
    to complete the session tear down, which in turn also handles
    the relevant state for the intermediate Tx queue.

    However, as this flow in asynchronous, the intermediate queue
    should be stopped and not continue servicing frames, as in
    such a case frames that are dequeued would be marked as part
    of an aggregation, although the aggregation is already been
    stopped.

    Fix this by stopping the intermediate Tx queue, before
    calling the low level driver to stop the Tx aggregation.

    Signed-off-by: Ilan Peer
    Signed-off-by: Luca Coelho
    Signed-off-by: Johannes Berg

    Ilan Peer
     
  • It's possible that the caller of cfg80211_classify8021d() uses the
    value to index an array, like mac80211 in ieee80211_downgrade_queue().
    Prevent speculation on the return value.

    Signed-off-by: Johannes Berg
    Signed-off-by: Luca Coelho
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • Without recording the netlink port ID, we cannot return the
    results or complete messages to userspace, nor will we be
    able to abort if the socket is closed, so clearly we need
    to fill the value.

    Signed-off-by: Johannes Berg
    Signed-off-by: Luca Coelho
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • Fix FTM per burst maximum value from 15 to 31
    (The maximal bits that represents that number in the frame
    is 5 hence a maximal value of 31)

    Signed-off-by: Aviya Erenfeld
    Signed-off-by: Luca Coelho
    Signed-off-by: Johannes Berg

    Aviya Erenfeld
     
  • If a driver does any significant activity in its ibss_join method,
    then it will very well expect that to be called during restart,
    before any stations are added. Do that.

    Signed-off-by: Johannes Berg
    Signed-off-by: Luca Coelho
    Signed-off-by: Johannes Berg

    Johannes Berg
     
  • Heiner Kallweit says:

    ====================
    r8169: revert two commits due to a regression

    Sander reported a regression (kernel panic, see[1]), therefore let's
    revert these commits. Removal of the barriers doesn't seem to
    contribute to the issue, the patch just overlaps with the problematic
    one and only reverting both patches was tested.

    [1] https://marc.info/?t=154965066400001&r=1&w=2

    v2:
    - improve commit message
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • This reverts commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356.

    Sander reported a regression causing a kernel panic[1],
    therefore let's revert this commit.

    [1] https://marc.info/?t=154965066400001&r=1&w=2

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     
  • This reverts commit bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3.

    There doesn't seem to be anything wrong with this patch,
    it's just reverted to get a stable baseline again.

    Signed-off-by: Heiner Kallweit
    Signed-off-by: David S. Miller

    Heiner Kallweit
     

09 Feb, 2019

12 commits

  • The recent change in the rx_curs_confirmed assignment disregards
    byte order, which causes problems on little endian architectures.
    This patch fixes it.

    Fixes: b8649efad879 ("net/smc: fix sender_free computation") (net-tree)
    Signed-off-by: Ursula Braun
    Signed-off-by: David S. Miller

    Ursula Braun
     
  • In the unlikely event that the kmalloc call in vmci_transport_socket_init()
    fails, we end-up calling vmci_transport_destruct() with a NULL vmci_trans()
    and oopsing.

    This change addresses the above explicitly checking for zero vmci_trans()
    at destruction time.

    Reported-by: Xiumei Mu
    Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
    Signed-off-by: Paolo Abeni
    Reviewed-by: Stefano Garzarella
    Reviewed-by: Jorgen Hansen
    Signed-off-by: David S. Miller

    Paolo Abeni
     
  • According to the algorithm described in the comment block at the
    beginning of ip_rt_send_redirect, the host should try to send
    'ip_rt_redirect_number' ICMP redirect packets with an exponential
    backoff and then stop sending them at all assuming that the destination
    ignores redirects.
    If the device has previously sent some ICMP error packets that are
    rate-limited (e.g TTL expired) and continues to receive traffic,
    the redirect packets will never be transmitted. This happens since
    peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
    and so it will never be reset even if the redirect silence timeout
    (ip_rt_redirect_silence) has elapsed without receiving any packet
    requiring redirects.

    Fix it by using a dedicated counter for the number of ICMP redirect
    packets that has been sent by the host

    I have not been able to identify a given commit that introduced the
    issue since ip_rt_send_redirect implements the same rate-limiting
    algorithm from commit 1da177e4c3f4 ("Linux-2.6.12-rc2")

    Signed-off-by: Lorenzo Bianconi
    Signed-off-by: David S. Miller

    Lorenzo Bianconi
     
  • When we probe a SFP module, we expect to be able to call the upstream
    device's module_insert() function so that the upstream link can be
    configured. However, when the upstream device is delayed, we currently
    may end up probing the module before the upstream device is available,
    and lose the module_insert() call.

    Avoid this by holding off probing the module until the SFP bus is
    properly connected to both the SFP socket driver and the upstream
    driver.

    Signed-off-by: Russell King
    Signed-off-by: David S. Miller

    Russell King
     
  • Pull networking fixes from David Miller:
    "This pull request is dedicated to the upcoming snowpocalypse parts 2
    and 3 in the Pacific Northwest:

    1) Drop profiles are broken because some drivers use dev_kfree_skb*
    instead of dev_consume_skb*, from Yang Wei.

    2) Fix IWLWIFI kconfig deps, from Luca Coelho.

    3) Fix percpu maps updating in bpftool, from Paolo Abeni.

    4) Missing station release in batman-adv, from Felix Fietkau.

    5) Fix some networking compat ioctl bugs, from Johannes Berg.

    6) ucc_geth must reset the BQL queue state when stopping the device,
    from Mathias Thore.

    7) Several XDP bug fixes in virtio_net from Toshiaki Makita.

    8) TSO packets must be sent always on queue 0 in stmmac, from Jose
    Abreu.

    9) Fix socket refcounting bug in RDS, from Eric Dumazet.

    10) Handle sparse cpu allocations in bpf selftests, from Martynas
    Pumputis.

    11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
    Feitkau.

    12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
    Greg Kroah-Hartman.

    13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
    ccid, from Eric Dumazet.

    14) Need to reload WoL password into bcmsysport device after deep
    sleeps, from Florian Fainelli.

    15) Remove filter from mask before freeing in cls_flower, from Petr
    Machata.

    16) Missing release and use after free in error paths of s390 qeth
    code, from Julian Wiedmann.

    17) Fix lockdep false positive in dsa code, from Marc Zyngier.

    18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.

    19) Fix EQ firmware assert in qed driver, from Manish Chopra.

    20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
    net: dsa: b53: Fix for failure when irq is not defined in dt
    sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
    geneve: should not call rt6_lookup() when ipv6 was disabled
    net: Don't default Cavium PTP driver to 'y'
    net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
    net/mlx5e: Don't overwrite pedit action when multiple pedit used
    net/mlx5e: Update hw flows when encap source mac changed
    qed*: Advance drivers version to 8.37.0.20
    qed: Change verbosity for coalescing message.
    qede: Fix system crash on configuring channels.
    qed: Consider TX tcs while deriving the max num_queues for PF.
    ...

    Linus Torvalds
     
  • Pull char/misc fixes from Greg KH:
    "Here are some small char and misc driver fixes for 5.0-rc6.

    Nothing huge here, some more binderfs fixups found as people use it,
    and there is a "large" selftest added to validate the binderfs code,
    which makes up the majority of this pull request.

    There's also some small mei and mic fixes to resolve some reported
    issues.

    All of these have been in linux-next for over a week with no reported
    issues"

    * tag 'char-misc-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    mic: vop: Fix crash on remove
    mic: vop: Fix use-after-free on remove
    binderfs: remove separate device_initcall()
    fpga: stratix10-soc: fix wrong of_node_put() in init function
    mic: vop: Fix broken virtqueues
    mei: free read cb on ctrl_wr list flush
    samples: mei: use /dev/mei0 instead of /dev/mei
    mei: me: add ice lake point device id.
    binderfs: respect limit on binder control creation
    binder: fix CONFIG_ANDROID_BINDER_DEVICES
    selftests: add binderfs selftests

    Linus Torvalds
     
  • Pull driver core fixes from Greg KH:
    "Here are some driver core fixes for 5.0-rc6.

    Well, not so much "driver core" as "debugfs". There's a lot of
    outstanding debugfs cleanup patches coming in through different
    subsystem trees, and in that process the debugfs core was found that
    it really should return errors when something bad happens, to prevent
    random files from showing up in the root of debugfs afterward. So
    debugfs was fixed up to handle this properly, and then two fixes for
    the relay and blk-mq code was needed as it was making invalid
    assumptions about debugfs return values.

    There's also a cacheinfo fix in here that resolves a tiny issue.

    All of these have been in linux-next for over a week with no reported
    problems"

    * tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    blk-mq: protect debugfs_create_files() from failures
    relay: check return of create_buf_file() properly
    debugfs: debugfs_lookup() should return NULL if not found
    debugfs: return error values, not NULL
    debugfs: fix debugfs_rename parameter checking
    cacheinfo: Keep the old value if of_property_read_u32 fails

    Linus Torvalds
     
  • Pull staging/IIO driver fixes from Greg KH:
    "Here are some small iio and staging driver fixes for 5.0-rc6.

    Nothing big, just resolve some reported IIO driver issues, and one
    staging driver bug. One staging driver patch was added and then
    reverted as well.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'staging-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    Revert "staging: erofs: keep corrupted fs from crashing kernel in erofs_namei()"
    staging: erofs: keep corrupted fs from crashing kernel in erofs_namei()
    staging: octeon: fix broken phylib usage
    iio: ti-ads8688: Update buffer allocation for timestamps
    tools: iio: iio_generic_buffer: make num_loops signed
    iio: adc: axp288: Fix TS-pin handling
    iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius

    Linus Torvalds
     
  • Pull tty/serial fixes from Greg KH:
    "Here are some small tty and serial fixes for 5.0-rc6.

    Nothing huge, just a few small fixes for reported issues. The speakup
    fix is in here as it is a tty operation issue.

    All of these have been in linux-next for a while with no reported
    problems"

    * tag 'tty-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    serial: fix race between flush_to_ldisc and tty_open
    staging: speakup: fix tty-operation NULL derefs
    serial: sh-sci: Do not free irqs that have already been freed
    serial: 8250_pci: Make PCI class test non fatal
    tty: serial: 8250_mtk: Fix potential NULL pointer dereference

    Linus Torvalds
     
  • Pull USB fixes from Grek KH:
    "Here are some small USB fixes for 5.0-rc6.

    Nothing huge, the normal amount of USB gadget fixes as well as some
    USB phy fixes. There's also a typec fix as well. Full details are in
    the shortlog.

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'usb-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    usb: typec: tcpm: Correct the PPS out_volt calculation
    usb: gadget: musb: fix short isoc packets with inventra dma
    usb: phy: am335x: fix race condition in _probe
    usb: dwc3: exynos: Fix error handling of clk_prepare_enable
    usb: phy: fix link errors
    usb: gadget: udc: net2272: Fix bitwise and boolean operations
    usb: dwc3: gadget: Handle 0 xfer length for OUT EP

    Linus Torvalds
     
  • Pull xfs fixes from Darrick Wong:
    "Here are a handful of XFS fixes to fix a data corruption problem, a
    crasher bug, and a deadlock.

    Summary:

    - Fix cache coherency problem with writeback mappings

    - Fix buffer deadlock when shutting fs down

    - Fix a null pointer dereference when running online repair"

    * tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
    xfs: set buffer ops when repair probes for btree type
    xfs: end sync buffer I/O properly on shutdown error
    xfs: eof trim writeback mapping as soon as it is cached

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Missed fixes last week as had nothing until amdgpu showed up on
    Saturday. Other stuff has since rolled in along with some more amdgpu
    fixes, so we have two weeks of those, and some i915, vmwgfx, sun4i,
    rockchip and omap fixes.

    amdgpu/radeon:
    - fix crash on passthrough for SI
    - fencing fix for shared buffers
    - APU hwmon fix
    - API powerplay fix
    - eDP freesync fix
    - PASID mgr locking fix
    - KFD warning fix
    - DC/powerplay fix
    - raven revision ids fix
    - vega20 doorbell fix

    i915:
    - SNB display fix
    - SKL srckey mask fix
    - ICL DDI clock selection fix

    vmwgfx:
    - DMA API fix
    - IOMMU detection fix
    - display fixes

    sun4i:
    - tcon clock fix

    rockchip:
    - SPDX identifier fix

    omap:
    - DSI fixes"

    * tag 'drm-fixes-2019-02-08' of git://anongit.freedesktop.org/drm/drm: (28 commits)
    drm/omap: dsi: Hack-fix DSI bus flags
    drm/omap: dsi: Fix OF platform depopulate
    drm/omap: dsi: Fix crash in DSI debug dumps
    drm/i915: Try to sanitize bogus DPLL state left over by broken SNB BIOSen
    drm/amd/display: Attach VRR properties for eDP connectors
    drm/amdkfd: Fix if preprocessor statement above kfd_fill_iolink_info_for_cpu
    drm/amdgpu: use spin_lock_irqsave to protect vm_manager.pasid_idr
    drm/i915: always return something on DDI clock selection
    drm/i915: Fix skl srckey mask bits
    drm/vmwgfx: Improve on IOMMU detection
    drm/vmwgfx: Fix setting of dma masks
    drm/vmwgfx: Also check for crtc status while checking for DU active
    drm/vmwgfx: Fix an uninitialized fence handle value
    drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
    drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init
    drm/amdgpu: fix the incorrect external id for raven series
    drm/amdgpu: Implement doorbell self-ring for NBIO 7.4
    drm/amd/display: Fix fclk idle state
    drm/amdgpu: Transfer fences to dmabuf importer
    drm/amd/powerplay: Fix missing break in switch
    ...

    Linus Torvalds
     

08 Feb, 2019

3 commits