22 Jun, 2018

1 commit

  • Pull NFS client bugfixes from Trond Myklebust:
    "Hightlights include:

    - fix an rcu deadlock in nfs_delegation_find_inode()

    - fix NFSv4 deadlocks due to not freeing the session slot in
    layoutget

    - don't send layoutreturn if the layout is already invalid

    - prevent duplicate XID allocation

    - flexfiles: Don't tie up all the rpciod threads in resends"

    * tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    pNFS/flexfiles: Process writeback resends from nfsiod context as well
    pNFS/flexfiles: Don't tie up all the rpciod threads in resends
    sunrpc: Prevent duplicate XID allocation
    pNFS: Don't send layoutreturn if the layout is already invalid
    pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception
    NFS: Fix an rcu deadlock in nfs_delegation_find_inode()

    Linus Torvalds
     

20 Jun, 2018

11 commits

  • The ipcm(6)_cookie field gso_size is set only in the udp path. The ip
    layer copies this to cork only if sk_type is SOCK_DGRAM. This check
    proved too permissive. Ping and l2tp sockets have the same type.

    Limit to sockets of type SOCK_DGRAM and protocol IPPROTO_UDP to
    exclude ping sockets.

    v1 -> v2
    - remove irrelevant whitespace changes

    Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
    Reported-by: Maciej Żenczykowski
    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
    enabled, add it to .gitignore to avoid committing it.

    Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Matteo Croce
    Signed-off-by: David S. Miller

    Matteo Croce
     
  • bpfilter Makefile assumes that the system locale is en_US, and the
    parsing of objdump output fails.
    Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
    only 2 processes instead of 7.

    Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
    Signed-off-by: Matteo Croce
    Signed-off-by: David S. Miller

    Matteo Croce
     
  • in the following script

    # tc actions add action ife encode allow prio pass index 42
    # tc actions replace action ife encode allow tcindex drop index 42

    the action control should remain equal to 'pass', if the kernel failed
    to replace the TC action. Pospone the assignment of the action control,
    to ensure it is not overwritten in the error path of tcf_ife_init().

    Fixes: ef6980b6becb ("introduce IFE action")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • a recursive lock warning [1] can be observed with the following script,

    # $TC actions add action ife encode allow prio pass index 42
    IFE type 0xED3E
    # $TC actions replace action ife encode allow tcindex pass index 42

    in case the kernel was unable to run the last command (e.g. because of
    the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
    the kernel can leak idr in the error path of tcf_ife_init(), because
    tcf_idr_release() is not called after successful idr reservation:

    # $TC actions add action ife encode allow tcindex index 47
    IFE type 0xED3E
    RTNETLINK answers: No such file or directory
    We have an error talking to the kernel
    # $TC actions add action ife encode allow tcindex index 47
    IFE type 0xED3E
    RTNETLINK answers: No space left on device
    We have an error talking to the kernel
    # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47
    IFE type 0xFEFE
    RTNETLINK answers: No space left on device
    We have an error talking to the kernel

    Since tcfa_lock is already taken when the action is being edited, a call
    to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
    again. On the other hand, tcf_idr_release() needs to be called in the
    error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
    Fix both problems in tcf_ife_init().
    Since the cleanup() routine can now be called when ife->params is NULL,
    also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).

    [1]
    ============================================
    WARNING: possible recursive locking detected
    4.17.0-rc4.kasan+ #417 Tainted: G E
    --------------------------------------------
    tc/3932 is trying to acquire lock:
    000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]

    but task is already holding lock:
    000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(&p->tcfa_lock)->rlock);
    lock(&(&p->tcfa_lock)->rlock);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    2 locks held by tc/3932:
    #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife]
    #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

    stack backtrace:
    CPU: 3 PID: 3932 Comm: tc Tainted: G E 4.17.0-rc4.kasan+ #417
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    Call Trace:
    dump_stack+0x9a/0xeb
    __lock_acquire+0xf43/0x34a0
    ? debug_check_no_locks_freed+0x2b0/0x2b0
    ? debug_check_no_locks_freed+0x2b0/0x2b0
    ? debug_check_no_locks_freed+0x2b0/0x2b0
    ? __mutex_lock+0x62f/0x1240
    ? kvm_sched_clock_read+0x1a/0x30
    ? sched_clock+0x5/0x10
    ? sched_clock_cpu+0x18/0x170
    ? find_held_lock+0x39/0x1d0
    ? lock_acquire+0x10b/0x330
    lock_acquire+0x10b/0x330
    ? tcf_ife_cleanup+0x19/0x80 [act_ife]
    _raw_spin_lock_bh+0x38/0x70
    ? tcf_ife_cleanup+0x19/0x80 [act_ife]
    tcf_ife_cleanup+0x19/0x80 [act_ife]
    __tcf_idr_release+0xff/0x350
    tcf_ife_init+0xdde/0x13c0 [act_ife]
    ? ife_exit_net+0x290/0x290 [act_ife]
    ? __lock_is_held+0xb4/0x140
    tcf_action_init_1+0x67b/0xad0
    ? tcf_action_dump_old+0xa0/0xa0
    ? sched_clock+0x5/0x10
    ? sched_clock_cpu+0x18/0x170
    ? kvm_sched_clock_read+0x1a/0x30
    ? sched_clock+0x5/0x10
    ? sched_clock_cpu+0x18/0x170
    ? memset+0x1f/0x40
    tcf_action_init+0x30f/0x590
    ? tcf_action_init_1+0xad0/0xad0
    ? memset+0x1f/0x40
    tc_ctl_action+0x48e/0x5e0
    ? mutex_lock_io_nested+0x1160/0x1160
    ? tca_action_gd+0x990/0x990
    ? sched_clock+0x5/0x10
    ? find_held_lock+0x39/0x1d0
    rtnetlink_rcv_msg+0x4da/0x990
    ? validate_linkmsg+0x680/0x680
    ? sched_clock_cpu+0x18/0x170
    ? find_held_lock+0x39/0x1d0
    netlink_rcv_skb+0x127/0x350
    ? validate_linkmsg+0x680/0x680
    ? netlink_ack+0x970/0x970
    ? __kmalloc_node_track_caller+0x304/0x3a0
    netlink_unicast+0x40f/0x5d0
    ? netlink_attachskb+0x580/0x580
    ? _copy_from_iter_full+0x187/0x760
    ? import_iovec+0x90/0x390
    netlink_sendmsg+0x67f/0xb50
    ? netlink_unicast+0x5d0/0x5d0
    ? copy_msghdr_from_user+0x206/0x340
    ? netlink_unicast+0x5d0/0x5d0
    sock_sendmsg+0xb3/0xf0
    ___sys_sendmsg+0x60a/0x8b0
    ? copy_msghdr_from_user+0x340/0x340
    ? lock_downgrade+0x5e0/0x5e0
    ? tty_write_lock+0x18/0x50
    ? kvm_sched_clock_read+0x1a/0x30
    ? sched_clock+0x5/0x10
    ? sched_clock_cpu+0x18/0x170
    ? find_held_lock+0x39/0x1d0
    ? lock_downgrade+0x5e0/0x5e0
    ? lock_acquire+0x10b/0x330
    ? __audit_syscall_entry+0x316/0x690
    ? current_kernel_time64+0x6b/0xd0
    ? __fget_light+0x55/0x1f0
    ? __sys_sendmsg+0xd2/0x170
    __sys_sendmsg+0xd2/0x170
    ? __ia32_sys_shutdown+0x70/0x70
    ? syscall_trace_enter+0x57a/0xd60
    ? rcu_read_lock_sched_held+0xdc/0x110
    ? __bpf_trace_sys_enter+0x10/0x10
    ? do_syscall_64+0x22/0x480
    do_syscall_64+0xa5/0x480
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7fd646988ba0
    RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
    RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0
    RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003
    RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000
    R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000
    R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100

    Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding")
    Fixes: ef6980b6becb ("introduce IFE action")
    Signed-off-by: Davide Caratti
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Davide Caratti
     
  • if dev_get_valid_name failed, propagate its return code

    and remove the setting err to ENODEV, it will be set to
    0 again before dev_change_net_namespace exits.

    Signed-off-by: Li RongQing
    Signed-off-by: David S. Miller

    Li RongQing
     
  • Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
    need to fail if dev_match is not true. Currently, a packet to a given port
    can match a socket bound to device when it should not. In the VRF case,
    this causes the lookup to hit a VRF socket and not a global socket
    resulting in a response trying to go through the VRF when it should not.

    Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups")
    Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups")
    Reported-by: Lou Berger
    Diagnosed-by: Renato Westphal
    Tested-by: Renato Westphal
    Signed-off-by: David Ahern
    Signed-off-by: David S. Miller

    David Ahern
     
  • syzbot reported use after free that is caused by fib6_info being
    freed without a proper RCU grace period.

    CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x1b9/0x294 lib/dump_stack.c:113
    print_address_description+0x6c/0x20b mm/kasan/report.c:256
    kasan_report_error mm/kasan/report.c:354 [inline]
    kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
    __read_once_size include/linux/compiler.h:188 [inline]
    find_rr_leaf net/ipv6/route.c:705 [inline]
    rt6_select net/ipv6/route.c:761 [inline]
    fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823
    ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856
    ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082
    fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122
    ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110
    ip6_route_output include/net/ip6_route.h:82 [inline]
    icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline]
    icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535
    icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
    ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244
    dst_link_failure include/net/dst.h:427 [inline]
    ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695
    neigh_invalidate+0x246/0x550 net/core/neighbour.c:892
    neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978
    call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
    expire_timers kernel/time/timer.c:1363 [inline]
    __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
    run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
    __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
    invoke_softirq kernel/softirq.c:364 [inline]
    irq_exit+0x1d1/0x200 kernel/softirq.c:404
    exiting_irq arch/x86/include/asm/apic.h:527 [inline]
    smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
    apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863

    RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482
    Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48
    RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0
    RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000
    R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0
    R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00
    strlen include/linux/string.h:267 [inline]
    getname_kernel+0x24/0x370 fs/namei.c:218
    open_exec+0x17/0x70 fs/exec.c:882
    load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780
    search_binary_handler+0x17d/0x570 fs/exec.c:1653
    exec_binprm fs/exec.c:1695 [inline]
    __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819
    do_execveat_common fs/exec.c:1866 [inline]
    do_execve fs/exec.c:1883 [inline]
    __do_sys_execve fs/exec.c:1964 [inline]
    __se_sys_execve fs/exec.c:1959 [inline]
    __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f1576a46207
    Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02
    RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
    RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207
    RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670
    RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c
    R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10
    R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005

    Allocated by task 12188:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
    kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
    kmalloc include/linux/slab.h:513 [inline]
    kzalloc include/linux/slab.h:706 [inline]
    fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152
    ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013
    ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154
    ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660
    inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
    sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
    sock_ioctl+0x30d/0x680 net/socket.c:1097
    vfs_ioctl fs/ioctl.c:46 [inline]
    file_ioctl fs/ioctl.c:500 [inline]
    do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
    ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
    __do_sys_ioctl fs/ioctl.c:708 [inline]
    __se_sys_ioctl fs/ioctl.c:706 [inline]
    __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 1402:
    save_stack+0x43/0xd0 mm/kasan/kasan.c:448
    set_track mm/kasan/kasan.c:460 [inline]
    __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
    kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
    __cache_free mm/slab.c:3498 [inline]
    kfree+0xd9/0x260 mm/slab.c:3813
    fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207
    fib6_info_release include/net/ip6_fib.h:286 [inline]
    __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline]
    ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316
    ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663
    inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
    sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
    sock_ioctl+0x30d/0x680 net/socket.c:1097
    vfs_ioctl fs/ioctl.c:46 [inline]
    file_ioctl fs/ioctl.c:500 [inline]
    do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
    ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
    __do_sys_ioctl fs/ioctl.c:708 [inline]
    __se_sys_ioctl fs/ioctl.c:706 [inline]
    __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
    do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8801b5df2580
    which belongs to the cache kmalloc-256 of size 256
    The buggy address is located 8 bytes inside of
    256-byte region [ffff8801b5df2580, ffff8801b5df2680)
    The buggy address belongs to the page:
    page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40
    flags: 0x2fffc0000000100(slab)
    raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0
    raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
    ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
    > ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
    ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb

    Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers")
    Signed-off-by: Eric Dumazet
    Cc: David Ahern
    Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com
    Acked-by: David Ahern
    Tested-by: David Ahern
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • This moves all of the netdev_printk(KERN_DEBUG, ...) messages over to
    netdev_dbg.

    As Joe explains:

    > netdev_dbg is not included in object code unless
    > DEBUG is defined or CONFIG_DYNAMIC_DEBUG is set.
    > And then, it is not emitted into the log unless
    > DEBUG is set or this specific netdev_dbg is enabled
    > via the dynamic debug control file.

    Which is what we're after in this case.

    Acked-by: Samuel Mendoza-Jonas
    Signed-off-by: Joel Stanley
    Signed-off-by: David S. Miller

    Joel Stanley
     
  • This does not provide useful information. As the ncsi maintainer said:

    > either we get a channel or broadcom has gone out to lunch

    Acked-by: Samuel Mendoza-Jonas
    Signed-off-by: Joel Stanley
    Signed-off-by: David S. Miller

    Joel Stanley
     
  • In normal operation we see this series of messages as the host drives
    the network device:

    ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
    ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
    ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
    ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
    ftgmac100 1e660000.ethernet eth0: NCSI interface down
    ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
    ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
    ftgmac100 1e660000.ethernet eth0: NCSI interface up
    ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
    ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
    ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
    ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
    ftgmac100 1e660000.ethernet eth0: NCSI interface down
    ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
    ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
    ftgmac100 1e660000.ethernet eth0: NCSI interface up

    This makes all of these messages netdev_dbg. They are still useful to
    debug eg. misbehaving network device firmware, but we do not need them
    filling up the kernel logs in normal operation.

    Acked-by: Samuel Mendoza-Jonas
    Signed-off-by: Joel Stanley
    Signed-off-by: David S. Miller

    Joel Stanley
     

19 Jun, 2018

1 commit

  • Krzysztof Kozlowski reports that a heavy NFSv4
    WRITE workload against a slow NFS server causes his Raspberry Pi
    clients to stall. Krzysztof bisected it to commit 37ac86c3a76c
    ("SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock") .

    I was able to reproduce similar behavior and it appears that rarely
    the RPC client layer is re-allocating an XID for an RPC that it has
    already partially sent. This results in the client ignoring the
    subsequent reply, which carries the original XID.

    For various reasons, checking !req->rq_xmit_bytes_sent in
    xprt_prepare_transmit is not a 100% reliable mechanism for
    determining when a fresh XID is needed.

    Trond's preference is to allocate the XID at the time each rpc_rqst
    slot is initialized.

    This patch should also address a gcc 4.1.2 complaint reported by
    Geert Uytterhoeven .

    Reported-by: Krzysztof Kozlowski
    Fixes: 37ac86c3a76c ("SUNRPC: Initialize rpc_rqst outside of ... ")
    Signed-off-by: Chuck Lever
    Tested-by: Krzysztof Kozlowski
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

17 Jun, 2018

3 commits

  • When blackhole is used on top of classful qdisc like hfsc it breaks
    qlen and backlog counters because packets are disappear without notice.

    In HFSC non-zero qlen while all classes are inactive triggers warning:
    WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
    and schedules watchdog work endlessly.

    This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
    this flag tells upper layer: this packet is gone and isn't queued.

    Signed-off-by: Konstantin Khlebnikov
    Signed-off-by: David S. Miller

    Konstantin Khlebnikov
     
  • ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
    which they are to be sent. But it doesn't take ownership of those
    packets from the sock (if any) which originally owned them. They should
    remain owned by their actual sender until they've left the box.

    There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
    for certain skbs, precisely to avoid messing up sk_wmem_alloc
    accounting. Ideally that hack would cover the ATM use case too, but it
    doesn't — skbs which aren't owned by any sock, for example PPP control
    frames, still get their truesize adjusted when the low-level ATM driver
    adds headroom.

    This has always been an issue, it seems. The truesize of a packet
    increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
    for normal traffic, only for control frames. So I think we just got away
    with it, and we probably needed to send 2GiB of LCP echo frames before
    the misaccounting would ever have caused a problem and caused
    atm_may_send() to start refusing packets.

    Commit 14afee4b609 ("net: convert sock.sk_wmem_alloc from atomic_t to
    refcount_t") did exactly what it was intended to do, and turned this
    mostly-theoretical problem into a real one, causing PPPoATM to fail
    immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
    starts refusing to allow new packets.

    The least intrusive solution to this problem is to stash the value of
    skb->truesize that was accounted to the VCC, in a new member of the
    ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
    value instead of the then-current value of skb->truesize.

    Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()")
    Signed-off-by: David Woodhouse
    Tested-by: Kevin Darbyshire-Bryant
    Signed-off-by: David S. Miller

    David Woodhouse
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2018-06-16

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix a panic in devmap handling in generic XDP where return type
    of __devmap_lookup_elem() got changed recently but generic XDP
    code missed the related update, from Toshiaki.

    2) Fix a freeze when BPF progs are loaded that include BPF to BPF
    calls when JIT is enabled where we would later bail out via error
    path w/o dropping kallsyms, and another one to silence syzkaller
    splats from locking prog read-only, from Daniel.

    3) Fix a bug in test_offloads.py BPF selftest which must not assume
    that the underlying system have no BPF progs loaded prior to test,
    and one in bpftool to fix accuracy of program load time, from Jakub.

    4) Fix a bug in bpftool's probe for availability of the bpf(2)
    BPF_TASK_FD_QUERY subcommand, from Yonghong.

    5) Fix a regression in AF_XDP's XDP_SKB receive path where queue
    id check got erroneously removed, from Björn.

    6) Fix missing state cleanup in BPF's xfrm tunnel test, from William.

    7) Check tunnel type more accurately in BPF's tunnel collect metadata
    kselftest, from Jian.

    8) Fix missing Kconfig fragments for BPF kselftests, from Anders.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

16 Jun, 2018

9 commits

  • Pull networking fixes from David Miller:

    1) Various netfilter fixlets from Pablo and the netfilter team.

    2) Fix regression in IPVS caused by lack of PMTU exceptions on local
    routes in ipv6, from Julian Anastasov.

    3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.

    4) Don't crash on poll in TLS, from Daniel Borkmann.

    5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
    including Avahi mDNS. From Bart Van Assche.

    6) Missing of_node_put in qcom/emac driver, from Yue Haibing.

    7) We lack checking of the TCP checking in one special case during SYN
    receive, from Frank van der Linden.

    8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.

    9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.

    10) Must grab HW caps before doing quirk checks in stmmac driver, from
    Jose Abreu.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
    net: stmmac: Run HWIF Quirks after getting HW caps
    neighbour: skip NTF_EXT_LEARNED entries during forced gc
    net: cxgb3: add error handling for sysfs_create_group
    tls: fix waitall behavior in tls_sw_recvmsg
    tls: fix use-after-free in tls_push_record
    l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
    l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
    mlxsw: spectrum_switchdev: Fix port_vlan refcounting
    mlxsw: spectrum_router: Align with new route replace logic
    mlxsw: spectrum_router: Allow appending to dev-only routes
    ipv6: Only emit append events for appended routes
    stmmac: added support for 802.1ad vlan stripping
    cfg80211: fix rcu in cfg80211_unregister_wdev
    mac80211: Move up init of TXQs
    mac80211_hwsim: fix module init error paths
    cfg80211: initialize sinfo in cfg80211_get_station
    nl80211: fix some kernel doc tag mistakes
    hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
    rds: avoid unenecessary cong_update in loop transport
    l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
    ...

    Linus Torvalds
     
  • Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed
    the return value type of __devmap_lookup_elem() from struct net_device *
    to struct bpf_dtab_netdev * but forgot to modify generic XDP code
    accordingly.

    Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
    net_device is expected, then skb->dev was set to invalid value.

    v2:
    - Fix compiler warning without CONFIG_BPF_SYSCALL.

    Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue")
    Signed-off-by: Toshiaki Makita
    Acked-by: Yonghong Song
    Acked-by: Jesper Dangaard Brouer
    Signed-off-by: Daniel Borkmann

    Toshiaki Makita
     
  • Commit 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
    added support for NTF_EXT_LEARNED for neighbour entries.
    NTF_EXT_LEARNED entries are neigh entries managed by control
    plane (eg: Ethernet VPN implementation in FRR routing suite).
    Periodic gc already excludes these entries. This patch extends
    it to forced gc which the earlier patch missed.

    Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
    Signed-off-by: Roopa Prabhu
    Signed-off-by: David S. Miller

    Roopa Prabhu
     
  • Current behavior in tls_sw_recvmsg() is to wait for incoming tls
    messages and copy up to exactly len bytes of data that the user
    provided. This is problematic in the sense that i) if no packet
    is currently queued in strparser we keep waiting until one has been
    processed and pushed into tls receive layer for tls_wait_data() to
    wake up and push the decrypted bits to user space. Given after
    tls decryption, we're back at streaming data, use sock_rcvlowat()
    hint from tcp socket instead. Retain current behavior with MSG_WAITALL
    flag and otherwise use the hint target for breaking the loop and
    returning to application. This is done if currently no ctx->recv_pkt
    is ready, otherwise continue to process it from our strparser
    backlog.

    Fixes: c46234ebb4d1 ("tls: RX path for ktls")
    Signed-off-by: Daniel Borkmann
    Acked-by: Dave Watson
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • syzkaller managed to trigger a use-after-free in tls like the
    following:

    BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
    Write of size 1 at addr ffff88037aa08000 by task a.out/2317

    CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
    Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
    Call Trace:
    dump_stack+0x71/0xab
    print_address_description+0x6a/0x280
    kasan_report+0x258/0x380
    ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
    tls_push_record.constprop.15+0x6a2/0x810 [tls]
    tls_sw_push_pending_record+0x2e/0x40 [tls]
    tls_sk_proto_close+0x3fe/0x710 [tls]
    ? tcp_check_oom+0x4c0/0x4c0
    ? tls_write_space+0x260/0x260 [tls]
    ? kmem_cache_free+0x88/0x1f0
    inet_release+0xd6/0x1b0
    __sock_release+0xc0/0x240
    sock_close+0x11/0x20
    __fput+0x22d/0x660
    task_work_run+0x114/0x1a0
    do_exit+0x71a/0x2780
    ? mm_update_next_owner+0x650/0x650
    ? handle_mm_fault+0x2f5/0x5f0
    ? __do_page_fault+0x44f/0xa50
    ? mm_fault_error+0x2d0/0x2d0
    do_group_exit+0xde/0x300
    __x64_sys_exit_group+0x3a/0x50
    do_syscall_64+0x9a/0x300
    ? page_fault+0x8/0x30
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    This happened through fault injection where aead_req allocation in
    tls_do_encryption() eventually failed and we returned -ENOMEM from
    the function. Turns out that the use-after-free is triggered from
    tls_sw_sendmsg() in the second tls_push_record(). The error then
    triggers a jump to waiting for memory in sk_stream_wait_memory()
    resp. returning immediately in case of MSG_DONTWAIT. What follows is
    the trim_both_sgl(sk, orig_size), which drops elements from the sg
    list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
    when the socket is being closed, where tls_sk_proto_close() callback
    is invoked. The tls_complete_pending_work() will figure that there's
    a pending closed tls record to be flushed and thus calls into the
    tls_push_pending_closed_record() from there. ctx->push_pending_record()
    is called from the latter, which is the tls_sw_push_pending_record()
    from sw path. This again calls into tls_push_record(). And here the
    tls_fill_prepend() will panic since the buffer address has been freed
    earlier via trim_both_sgl(). One way to fix it is to move the aead
    request allocation out of tls_do_encryption() early into tls_push_record().
    This means we don't prep the tls header and advance state to the
    TLS_PENDING_CLOSED_RECORD before allocation which could potentially
    fail happened. That fixes the issue on my side.

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
    Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann
    Acked-by: Dave Watson
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
    'session' may be an Ethernet pseudo-wire.

    However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
    assumes l2tp_session_priv() points to a pppol2tp_session structure. For
    an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
    structure instead, making pppol2tp_session_ioctl() access invalid
    memory.

    Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
    L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
    found there. However, l2tp_netlink accepts creating Ethernet sessions
    regardless of the underlying tunnel version.

    This confuses pppol2tp_seq_session_show(), which expects that
    l2tp_session_priv() returns a pppol2tp_session structure. When the
    session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
    instead. This leads to invalid memory access when
    pppol2tp_session_get_sock() later tries to dereference ps->sk.

    Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • Current code will emit an append event in the FIB notification chain for
    any route added with NLM_F_APPEND set, even if the route was not
    appended to any existing route.

    This is inconsistent with IPv4 where such an event is only emitted when
    the new route is appended after an existing one.

    Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
    handle these events.

    Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
    Signed-off-by: Ido Schimmel
    Acked-by: Jiri Pirko
    Acked-by: David Ahern
    Signed-off-by: David S. Miller

    Ido Schimmel
     
  • …kernel/git/jberg/mac80211

    Johannes Berg says:

    ====================
    A handful of fixes:
    * missing RCU grace period enforcement led to drivers freeing
    data structures before; fix from Dedy Lansky.
    * hwsim module init error paths were messed up; fixed it myself
    after a report from Colin King (who had sent a partial patch)
    * kernel-doc tag errors; fix from Luca Coelho
    * initialize the on-stack sinfo data structure when getting
    station information; fix from Sven Eckelmann
    * TXQ state dumping is now done from init, and when TXQs aren't
    initialized yet at that point, bad things happen, move the
    initialization; fix from Toke Høiland-Jørgensen.
    ====================

    Signed-off-by: David S. Miller <davem@davemloft.net>

    David S. Miller
     

15 Jun, 2018

11 commits

  • Callers of cfg80211_unregister_wdev can free the wdev object
    immediately after this function returns. This may crash the kernel
    because this wdev object is still in use by other threads.
    Add synchronize_rcu() after list_del_rcu to make sure wdev object can
    be safely freed.

    Signed-off-by: Dedy Lansky
    Signed-off-by: Johannes Berg

    Dedy Lansky
     
  • On init, ieee80211_if_add() dumps the interface. Since that now includes a
    dump of the TXQ state, we need to initialise that before the dump happens.
    So move up the TXQ initialisation to to before the call to
    ieee80211_if_add().

    Fixes: 52539ca89f36 ("cfg80211: Expose TXQ stats and parameters to userspace")
    Reported-by: Niklas Cassel
    Signed-off-by: Toke Høiland-Jørgensen
    Tested-by: Niklas Cassel
    Signed-off-by: Johannes Berg

    Toke Høiland-Jørgensen
     
  • Most of the implementations behind cfg80211_get_station will not initialize
    sinfo to zero before manipulating it. For example, the member "filled",
    which indicates the filled in parts of this struct, is often only modified
    by enabling certain bits in the bitfield while keeping the remaining bits
    in their original state. A caller without a preinitialized sinfo.filled can
    then no longer decide which parts of sinfo were filled in by
    cfg80211_get_station (or actually the underlying implementations).

    cfg80211_get_station must therefore take care that sinfo is initialized to
    zero. Otherwise, the caller may tries to read information which was not
    filled in and which must therefore also be considered uninitialized. In
    batadv_v_elp_get_throughput's case, an invalid "random" expected throughput
    may be stored for this neighbor and thus the B.A.T.M.A.N V algorithm may
    switch to non-optimal neighbors for certain destinations.

    Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API")
    Reported-by: Thomas Lauer
    Reported-by: Marcel Schmidt
    Cc: b.a.t.m.a.n@lists.open-mesh.org
    Signed-off-by: Sven Eckelmann
    Signed-off-by: Johannes Berg

    Sven Eckelmann
     
  • Loop transport which is self loopback, remote port congestion
    update isn't relevant. Infact the xmit path already ignores it.
    Receive path needs to do the same.

    Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
    Reviewed-by: Sowmini Varadhan
    Signed-off-by: Santosh Shilimkar
    Signed-off-by: David S. Miller

    Santosh Shilimkar
     
  • pppol2tp_connect() may create a tunnel or a session. Remove them in
    case of error.

    Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • If 'fd' is negative, l2tp_tunnel_create() creates a tunnel socket using
    the configuration passed in 'tcfg'. Currently, pppol2tp_connect() sets
    the relevant fields to zero, tricking l2tp_tunnel_create() into setting
    up an unusable kernel socket.

    We can't set 'tcfg' with the required fields because there's no way to
    get them from the current connect() parameters. So let's restrict
    kernel sockets creation to the netlink API, which is the original use
    case.

    Fixes: 789a4a2c61d8 ("l2tp: Add support for static unmanaged L2TPv3 tunnels")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • l2tp_session_priv() returns a struct pppol2tp_session pointer only for
    PPPoL2TP sessions. In particular, if the session is an L2TP_PWTYPE_ETH
    pseudo-wire, l2tp_session_priv() returns a pointer to an l2tp_eth_sess
    structure, which is much smaller than struct pppol2tp_session. This
    leads to invalid memory dereference when trying to lock ps->sk_lock.

    Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • Define cfg.pw_type so that the new session is created with its .pwtype
    field properly set (L2TP_PWTYPE_PPP).

    Not setting the pseudo-wire type had several annoying effects:

    * Invalid value returned in the L2TP_ATTR_PW_TYPE attribute when
    dumping sessions with the netlink API.

    * Impossibility to delete the session using the netlink API (because
    l2tp_nl_cmd_session_delete() gets the deletion callback function
    from an array indexed by the session's pseudo-wire type).

    Also, there are several cases where we should check a session's
    pseudo-wire type. For example, pppol2tp_connect() should refuse to
    connect a session that is not PPPoL2TP, but that requires the session's
    .pwtype field to be properly set.

    Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support")
    Signed-off-by: Guillaume Nault
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash
    table") introduced an optimization for the handling of child sockets
    created for a new TCP connection.

    But this optimization passes any data associated with the last ACK of the
    connection handshake up the stack without verifying its checksum, because it
    calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
    directly. These lower-level processing functions do not do any checksum
    verification.

    Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
    fix this.

    Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
    Signed-off-by: Frank van der Linden
    Signed-off-by: Eric Dumazet
    Tested-by: Balbir Singh
    Reviewed-by: Balbir Singh
    Signed-off-by: David S. Miller

    Frank van der Linden
     
  • Pull ceph updates from Ilya Dryomov:
    "The main piece is a set of libceph changes that revamps how OSD
    requests are aborted, improving CephFS ENOSPC handling and making
    "umount -f" actually work (Zheng and myself).

    The rest is mostly mount option handling cleanups from Chengguang and
    assorted fixes from Zheng, Luis and Dongsheng.

    * tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits)
    rbd: flush rbd_dev->watch_dwork after watch is unregistered
    ceph: update description of some mount options
    ceph: show ino32 if the value is different with default
    ceph: strengthen rsize/wsize/readdir_max_bytes validation
    ceph: fix alignment of rasize
    ceph: fix use-after-free in ceph_statfs()
    ceph: prevent i_version from going back
    ceph: fix wrong check for the case of updating link count
    libceph: allocate the locator string with GFP_NOFAIL
    libceph: make abort_on_full a per-osdc setting
    libceph: don't abort reads in ceph_osdc_abort_on_full()
    libceph: avoid a use-after-free during map check
    libceph: don't warn if req->r_abort_on_full is set
    libceph: use for_each_request() in ceph_osdc_abort_on_full()
    libceph: defer __complete_request() to a workqueue
    libceph: move more code into __complete_request()
    libceph: no need to call flush_workqueue() before destruction
    ceph: flush pending works before shutdown super
    ceph: abort osd requests on force umount
    libceph: introduce ceph_osdc_abort_requests()
    ...

    Linus Torvalds
     
  • Now sctp GSO uses skb_gro_receive() to append the data into head
    skb frag_list. However it actually only needs very few code from
    skb_gro_receive(). Besides, NAPI_GRO_CB has to be set while most
    of its members are not needed here.

    This patch is to add sctp_packet_gso_append() to build GSO frames
    instead of skb_gro_receive(), and it would avoid many unnecessary
    checks and make the code clearer.

    Note that sctp will use page frags instead of frag_list to build
    GSO frames in another patch. But it may take time, as sctp's GSO
    frames may have different size. skb_segment() can only split it
    into the frags with the same size, which would break the border
    of sctp chunks.

    Signed-off-by: Xin Long
    Reviewed-by: Marcelo Ricardo Leitner
    Acked-by: Neil Horman
    Signed-off-by: David S. Miller

    Xin Long
     

14 Jun, 2018

1 commit

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter patches for your net tree:

    1) Fix NULL pointer dereference from nf_nat_decode_session() if NAT is
    not loaded, from Prashant Bhole.

    2) Fix socket extension module autoload.

    3) Don't bogusly reject sets with the NFT_SET_EVAL flag set on from
    the dynset extension.

    4) Fix races with nf_tables module removal and netns exit path,
    patches from Florian Westphal.

    5) Don't hit BUG_ON if jumpstack goes too deep, instead hit
    WARN_ON_ONCE, from Taehee Yoo.

    6) Another NULL pointer dereference from ctnetlink, again if NAT is
    not loaded, from Florian Westphal.

    7) Fix x_tables match list corruption in xt_connmark module removal
    path, also from Florian.

    8) nf_conncount doesn't properly deal with conntrack zones, hence
    garbage collector may get rid of entries in a different zone.
    From Yi-Hung Wei.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     

13 Jun, 2018

3 commits

  • Pull more overflow updates from Kees Cook:
    "The rest of the overflow changes for v4.18-rc1.

    This includes the explicit overflow fixes from Silvio, further
    struct_size() conversions from Matthew, and a bug fix from Dan.

    But the bulk of it is the treewide conversions to use either the
    2-factor argument allocators (e.g. kmalloc(a * b, ...) into
    kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
    b) into vmalloc(array_size(a, b)).

    Coccinelle was fighting me on several fronts, so I've done a bunch of
    manual whitespace updates in the patches as well.

    Summary:

    - Error path bug fix for overflow tests (Dan)

    - Additional struct_size() conversions (Matthew, Kees)

    - Explicitly reported overflow fixes (Silvio, Kees)

    - Add missing kvcalloc() function (Kees)

    - Treewide conversions of allocators to use either 2-factor argument
    variant when available, or array_size() and array3_size() as needed
    (Kees)"

    * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
    treewide: Use array_size in f2fs_kvzalloc()
    treewide: Use array_size() in f2fs_kzalloc()
    treewide: Use array_size() in f2fs_kmalloc()
    treewide: Use array_size() in sock_kmalloc()
    treewide: Use array_size() in kvzalloc_node()
    treewide: Use array_size() in vzalloc_node()
    treewide: Use array_size() in vzalloc()
    treewide: Use array_size() in vmalloc()
    treewide: devm_kzalloc() -> devm_kcalloc()
    treewide: devm_kmalloc() -> devm_kmalloc_array()
    treewide: kvzalloc() -> kvcalloc()
    treewide: kvmalloc() -> kvmalloc_array()
    treewide: kzalloc_node() -> kcalloc_node()
    treewide: kzalloc() -> kcalloc()
    treewide: kmalloc() -> kmalloc_array()
    mm: Introduce kvcalloc()
    video: uvesafb: Fix integer overflow in allocation
    UBIFS: Fix potential integer overflow in allocation
    leds: Use struct_size() in allocation
    Convert intel uncore to struct_size
    ...

    Linus Torvalds
     
  • The vzalloc_node() function has no 2-factor argument form, so
    multiplication factors need to be wrapped in array_size(). This patch
    replaces cases of:

    vzalloc_node(a * b, node)

    with:
    vzalloc_node(array_size(a, b), node)

    as well as handling cases of:

    vzalloc_node(a * b * c, node)

    with:

    vzalloc_node(array3_size(a, b, c), node)

    This does, however, attempt to ignore constant size factors like:

    vzalloc_node(4 * 1024, node)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vzalloc_node(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vzalloc_node(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vzalloc_node(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vzalloc_node(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vzalloc_node(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vzalloc_node(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vzalloc_node(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc_node(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vzalloc_node(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vzalloc_node(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc_node(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vzalloc_node(C1 * C2 * C3, ...)
    |
    vzalloc_node(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vzalloc_node(C1 * C2, ...)
    |
    vzalloc_node(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     
  • The vzalloc() function has no 2-factor argument form, so multiplication
    factors need to be wrapped in array_size(). This patch replaces cases of:

    vzalloc(a * b)

    with:
    vzalloc(array_size(a, b))

    as well as handling cases of:

    vzalloc(a * b * c)

    with:

    vzalloc(array3_size(a, b, c))

    This does, however, attempt to ignore constant size factors like:

    vzalloc(4 * 1024)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vzalloc(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vzalloc(C1 * C2 * C3, ...)
    |
    vzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vzalloc(C1 * C2, ...)
    |
    vzalloc(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook