18 Mar, 2019

1 commit

  • Pull 9p updates from Dominique Martinet:
    "Here is a 9p update for 5.1; there honestly hasn't been much.

    Two fixes (leak on invalid mount argument and possible deadlock on
    i_size update on 32bit smp) and a fall-through warning cleanup"

    * tag '9p-for-5.1' of git://github.com/martinetd/linux:
    9p/net: fix memory leak in p9_client_create
    9p: use inode->i_lock to protect i_size_write() under 32-bit
    9p: mark expected switch fall-through

    Linus Torvalds
     

17 Mar, 2019

1 commit

  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    Bugfixes:
    - Fix an Oops in SUNRPC back channel tracepoints
    - Fix a SUNRPC client regression when handling oversized replies
    - Fix the minimal size for SUNRPC reply buffer allocation
    - rpc_decode_header() must always return a non-zero value on error
    - Fix a typo in pnfs_update_layout()

    Cleanup:
    - Remove redundant check for the reply length in call_decode()"

    * tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    SUNRPC: Remove redundant check for the reply length in call_decode()
    SUNRPC: Handle the SYSTEM_ERR rpc error
    SUNRPC: rpc_decode_header() must always return a non-zero value on error
    SUNRPC: Use the ENOTCONN error on socket disconnect
    SUNRPC: Fix the minimal size for reply buffer allocation
    SUNRPC: Fix a client regression when handling oversized replies
    pNFS: Fix a typo in pnfs_update_layout
    fix null pointer deref in tracepoints in back channel

    Linus Torvalds
     

16 Mar, 2019

6 commits


15 Mar, 2019

1 commit

  • Pull networking fixes from David Miller:
    "More fixes in the queue:

    1) Netfilter nat can erroneously register the device notifier twice,
    fix from Florian Westphal.

    2) Use after free in nf_tables, from Pablo Neira Ayuso.

    3) Parallel update of steering rule fix in mlx5 river, from Eli
    Britstein.

    4) RX processing panic in lan743x, fix from Bryan Whitehead.

    5) Use before initialization of TCP_SKB_CB, fix from Christoph Paasch.

    6) Fix locking in SRIOV mode of mlx4 driver, from Jack Morgenstein.

    7) Fix TX stalls in lan743x due to mishandling of interrupt ACKing
    modes, from Bryan Whitehead.

    8) Fix infoleak in l2tp_ip6_recvmsg(), from Eric Dumazet"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
    pptp: dst_release sk_dst_cache in pptp_sock_destruct
    MAINTAINERS: GENET & SYSTEMPORT: Add internal Broadcom list
    l2tp: fix infoleak in l2tp_ip6_recvmsg()
    net/tls: Inform user space about send buffer availability
    net_sched: return correct value for *notify* functions
    lan743x: Fix TX Stall Issue
    net/mlx4_core: Fix qp mtt size calculation
    net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling
    net/mlx4_core: Fix reset flow when in command polling mode
    mlxsw: minimal: Initialize base_mac
    mlxsw: core: Prevent duplication during QSFP module initialization
    net: dwmac-sun8i: fix a missing check of of_get_phy_mode
    net: sh_eth: fix a missing check of of_get_phy_mode
    net: 8390: fix potential NULL pointer dereferences
    net: fujitsu: fix a potential NULL pointer dereference
    net: qlogic: fix a potential NULL pointer dereference
    isdn: hfcpci: fix potential NULL pointer dereference
    Documentation: devicetree: add a new optional property for port mac address
    net: rocker: fix a potential NULL pointer dereference
    net: qlge: fix a potential NULL pointer dereference
    ...

    Linus Torvalds
     

14 Mar, 2019

3 commits

  • Back in 2013 Hannes took care of most of such leaks in commit
    bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")

    But the bug in l2tp_ip6_recvmsg() has not been fixed.

    syzbot report :

    BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
    CPU: 1 PID: 10996 Comm: syz-executor362 Not tainted 5.0.0+ #11
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:600
    kmsan_internal_check_memory+0x9f4/0xb10 mm/kmsan/kmsan.c:694
    kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
    _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
    copy_to_user include/linux/uaccess.h:174 [inline]
    move_addr_to_user+0x311/0x570 net/socket.c:227
    ___sys_recvmsg+0xb65/0x1310 net/socket.c:2283
    do_recvmmsg+0x646/0x10c0 net/socket.c:2390
    __sys_recvmmsg net/socket.c:2469 [inline]
    __do_sys_recvmmsg net/socket.c:2492 [inline]
    __se_sys_recvmmsg+0x1d1/0x350 net/socket.c:2485
    __x64_sys_recvmmsg+0x62/0x80 net/socket.c:2485
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x445819
    Code: e8 6c b6 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 2b 12 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007f64453eddb8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
    RAX: ffffffffffffffda RBX: 00000000006dac28 RCX: 0000000000445819
    RDX: 0000000000000005 RSI: 0000000020002f80 RDI: 0000000000000003
    RBP: 00000000006dac20 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dac2c
    R13: 00007ffeba8f87af R14: 00007f64453ee9c0 R15: 20c49ba5e353f7cf

    Local variable description: ----addr@___sys_recvmsg
    Variable was created at:
    ___sys_recvmsg+0xf6/0x1310 net/socket.c:2244
    do_recvmmsg+0x646/0x10c0 net/socket.c:2390

    Bytes 0-31 of 32 are uninitialized
    Memory access of size 32 starts at ffff8880ae62fbb0
    Data copied to user address 0000000020000000

    Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • A previous fix ("tls: Fix write space handling") assumed that
    user space application gets informed about the socket send buffer
    availability when tls_push_sg() gets called. Inside tls_push_sg(), in
    case do_tcp_sendpages() returns 0, the function returns without calling
    ctx->sk_write_space. Further, the new function tls_sw_write_space()
    did not invoke ctx->sk_write_space. This leads to situation that user
    space application encounters a lockup always waiting for socket send
    buffer to become available.

    Rather than call ctx->sk_write_space from tls_push_sg(), it should be
    called from tls_write_space. So whenever tcp stack invokes
    sk->sk_write_space after freeing socket send buffer, we always declare
    the same to user space by the way of invoking ctx->sk_write_space.

    Fixes: 7463d3a2db0ef ("tls: Fix write space handling")
    Signed-off-by: Vakul Garg
    Reviewed-by: Boris Pismenny
    Signed-off-by: David S. Miller

    Vakul Garg
     
  • It is confusing to directly use return value of netlink_send()/
    netlink_unicast() as the return value of *notify*, as it may be not
    error at all.

    Example: in tc_del_tfilter(), after calling tfilter_del_notify(), it will
    goto errout if (err). However, the netlink_send()/netlink_unicast() will
    return positive value even for successful case. So it may not call
    tcf_chain_tp_remove() and so on to clean up the resource, as a result,
    resource is leaked.

    It may be easier to only check the return value of tfilter_del_nofiy(),
    but it is more clean to correct all related functions.

    Co-developed-by: Zengmo Gao
    Signed-off-by: Zhike Wang
    Acked-by: Cong Wang
    Signed-off-by: David S. Miller

    Zhike Wang
     

13 Mar, 2019

7 commits

  • If msize is less than 4096, we should close and put trans, destroy
    tagpool, not just free client. This patch fixes that.

    Link: http://lkml.kernel.org/m/1552464097-142659-1-git-send-email-zhengbin13@huawei.com
    Cc: stable@vger.kernel.org
    Fixes: 574d356b7a02 ("9p/net: put a lower bound on msize")
    Reported-by: Hulk Robot
    Signed-off-by: zhengbin
    Signed-off-by: Dominique Martinet

    zhengbin
     
  • Pull NFS server updates from Bruce Fields:
    "Miscellaneous NFS server fixes.

    Probably the most visible bug is one that could artificially limit
    NFSv4.1 performance by limiting the number of oustanding rpcs from a
    single client.

    Neil Brown also gets a special mention for fixing a 14.5-year-old
    memory-corruption bug in the encoding of NFSv3 readdir responses"

    * tag 'nfsd-5.1' of git://linux-nfs.org/~bfields/linux:
    nfsd: allow nfsv3 readdir request to be larger.
    nfsd: fix wrong check in write_v4_end_grace()
    nfsd: fix memory corruption caused by readdir
    nfsd: fix performance-limiting session calculation
    svcrpc: fix UDP on servers with lots of threads
    svcrdma: Remove syslog warnings in work completion handlers
    svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabled
    svcrdma: Use struct_size() in kmalloc()
    svcrpc: fix unlikely races preventing queueing of sockets
    svcrpc: svc_xprt_has_something_to_do seems a little long
    SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()
    nfsd: fix an IS_ERR() vs NULL check

    Linus Torvalds
     
  • Pull ceph updates from Ilya Dryomov:
    "The highlights are:

    - rbd will now ignore discards that aren't aligned and big enough to
    actually free up some space (myself). This is controlled by the new
    alloc_size map option and can be disabled if needed.

    - support for rbd deep-flatten feature (myself). Deep-flatten allows
    "rbd flatten" to fully disconnect the clone image and its snapshots
    from the parent and make the parent snapshot removable.

    - a new round of cap handling improvements (Zheng Yan). The kernel
    client should now be much more prompt about releasing its caps and
    it is possible to put a limit on the number of caps held.

    - support for getting ceph.dir.pin extended attribute (Zheng Yan)"

    * tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-client: (26 commits)
    Documentation: modern versions of ceph are not backed by btrfs
    rbd: advertise support for RBD_FEATURE_DEEP_FLATTEN
    rbd: whole-object write and zeroout should copyup when snapshots exist
    rbd: copyup with an empty snapshot context (aka deep-copyup)
    rbd: introduce rbd_obj_issue_copyup_ops()
    rbd: stop copying num_osd_ops in rbd_obj_issue_copyup()
    rbd: factor out __rbd_osd_req_create()
    rbd: clear ->xferred on error from rbd_obj_issue_copyup()
    rbd: remove experimental designation from kernel layering
    ceph: add mount option to limit caps count
    ceph: periodically trim stale dentries
    ceph: delete stale dentry when last reference is dropped
    ceph: remove dentry_lru file from debugfs
    ceph: touch existing cap when handling reply
    ceph: pass inclusive lend parameter to filemap_write_and_wait_range()
    rbd: round off and ignore discards that are too small
    rbd: handle DISCARD and WRITE_ZEROES separately
    rbd: get rid of obj_req->obj_request_count
    libceph: use struct_size() for kmalloc() in crush_decode()
    ceph: send cap releases more aggressively
    ...

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable fixes:
    - Fixes for NFS I/O request leakages
    - Fix error handling paths in the NFS I/O recoalescing code
    - Reinitialise NFSv4.1 sequence results before retransmitting a
    request
    - Fix a soft lockup in the delegation recovery code
    - Bulk destroy of layouts needs to be safe w.r.t. umount
    - Prevent thundering herd issues when the SUNRPC socket is not
    connected
    - Respect RPC call timeouts when retrying transmission

    Features:
    - Convert rpc auth layer to use xdr_streams
    - Config option to disable insecure RPCSEC_GSS crypto types
    - Reduce size of RPC receive buffers
    - Readdirplus optimization by cache mechanism
    - Convert SUNRPC socket send code to use iov_iter()
    - SUNRPC micro-optimisations to avoid indirect calls
    - Add support for the pNFS LAYOUTERROR operation and use it with the
    pNFS/flexfiles driver
    - Add trace events to report non-zero NFS status codes
    - Various removals of unnecessary dprintks

    Bugfixes and cleanups:
    - Fix a number of sparse warnings and documentation format warnings
    - Fix nfs_parse_devname to not modify it's argument
    - Fix potential corruption of page being written through pNFS/blocks
    - fix xfstest generic/099 failures on nfsv3
    - Avoid NFSv4.1 "false retries" when RPC calls are interrupted
    - Abort I/O early if the pNFS/flexfiles layout segment was
    invalidated
    - Avoid unnecessary pNFS/flexfiles layout invalidations"

    * tag 'nfs-for-5.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
    SUNRPC: Take the transport send lock before binding+connecting
    SUNRPC: Micro-optimise when the task is known not to be sleeping
    SUNRPC: Check whether the task was transmitted before rebind/reconnect
    SUNRPC: Remove redundant calls to RPC_IS_QUEUED()
    SUNRPC: Clean up
    SUNRPC: Respect RPC call timeouts when retrying transmission
    SUNRPC: Fix up RPC back channel transmission
    SUNRPC: Prevent thundering herd when the socket is not connected
    SUNRPC: Allow dynamic allocation of back channel slots
    NFSv4.1: Bump the default callback session slot count to 16
    SUNRPC: Convert remaining GFP_NOIO, and GFP_NOWAIT sites in sunrpc
    NFS/flexfiles: Clean up mirror DS initialisation
    NFS/flexfiles: Remove dead code in ff_layout_mirror_valid()
    NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid()
    NFS/flexfile: Simplify nfs4_ff_layout_ds_version()
    NFS/flexfiles: Simplify ff_layout_get_ds_cred()
    NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client()
    NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh()
    NFS/flexfiles: Speed up read failover when DSes are down
    NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive
    ...

    Linus Torvalds
     
  • Pull misc vfs updates from Al Viro:
    "Assorted fixes (really no common topic here)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Make __vfs_write() static
    vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
    pipe: stop using ->can_merge
    splice: don't merge into linked buffers
    fs: move generic stat response attr handling to vfs_getattr_nosec
    orangefs: don't reinitialize result_mask in ->getattr
    fs/devpts: always delete dcache dentry-s in dput()

    Linus Torvalds
     
  • This also makes sctp_stream_alloc_(out|in) saner, in that they no longer
    allocate new flex_arrays/genradixes, they just preallocate more
    elements.

    This code does however have a suspicious lack of locking.

    Link: http://lkml.kernel.org/r/20181217131929.11727-7-kent.overstreet@gmail.com
    Signed-off-by: Kent Overstreet
    Cc: Vlad Yasevich
    Cc: Neil Horman
    Cc: Marcelo Ricardo Leitner
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Paris
    Cc: Matthew Wilcox
    Cc: Paul Moore
    Cc: Pravin B Shelar
    Cc: Shaohua Li
    Cc: Stephen Smalley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Patch series "generic radix trees; drop flex arrays".

    This patch (of 7):

    There was no real need for this code to be using flexarrays, it's just
    implementing a hash table - ideally it would be using rhashtables, but
    that conversion would be significantly more complicated.

    Link: http://lkml.kernel.org/r/20181217131929.11727-2-kent.overstreet@gmail.com
    Signed-off-by: Kent Overstreet
    Reviewed-by: Matthew Wilcox
    Cc: Pravin B Shelar
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Dave Hansen
    Cc: Eric Paris
    Cc: Marcelo Ricardo Leitner
    Cc: Neil Horman
    Cc: Paul Moore
    Cc: Shaohua Li
    Cc: Stephen Smalley
    Cc: Vlad Yasevich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

12 Mar, 2019

5 commits

  • Pablo Neira Ayuso says:

    ====================
    Netfilter fixes for net

    The following patchset contains Netfilter fixes for your net tree:

    1) Fix list corruption in device notifier in the masquerade
    infrastructure, from Florian Westphal.

    2) Fix double-free of sets and use-after-free when deleting elements.

    3) Don't bogusly return EBUSY when removing a set after flush command.

    4) Use-after-free in dynamically allocate operations.

    5) Don't report a new ruleset generation to userspace if transaction
    list is empty, this invalidates the userspace cache innecessarily.
    From Florian Westphal.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In case x25_connect() fails and frees the socket neighbour,
    we also need to undo the change done to x25->state.

    Before my last bug fix, we had use-after-free so this
    patch fixes a latent bug.

    syzbot report :

    kasan: CONFIG_KASAN_INLINE enabled
    kasan: GPF could be caused by NULL-ptr deref or user memory access
    general protection fault: 0000 [#1] PREEMPT SMP KASAN
    CPU: 1 PID: 16137 Comm: syz-executor.1 Not tainted 5.0.0+ #117
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:x25_write_internal+0x1e8/0xdf0 net/x25/x25_subr.c:173
    Code: 00 40 88 b5 e0 fe ff ff 0f 85 01 0b 00 00 48 8b 8b 80 04 00 00 48 ba 00 00 00 00 00 fc ff df 48 8d 79 1c 48 89 fe 48 c1 ee 03 b6 34 16 48 89 fa 83 e2 07 83 c2 03 40 38 f2 7c 09 40 84 f6 0f
    RSP: 0018:ffff888076717a08 EFLAGS: 00010207
    RAX: ffff88805f2f2292 RBX: ffff8880a0ae6000 RCX: 0000000000000000
    kobject: 'loop5' (0000000018d0d0ee): kobject_uevent_env
    RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 000000000000001c
    RBP: ffff888076717b40 R08: ffff8880950e0580 R09: ffffed100be5e46d
    R10: ffffed100be5e46c R11: ffff88805f2f2363 R12: ffff888065579840
    kobject: 'loop5' (0000000018d0d0ee): fill_kobj_path: path = '/devices/virtual/block/loop5'
    R13: 1ffff1100ece2f47 R14: 0000000000000013 R15: 0000000000000013
    FS: 00007fb88cf43700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f9a42a41028 CR3: 0000000087a67000 CR4: 00000000001406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    x25_release+0xd0/0x340 net/x25/af_x25.c:658
    __sock_release+0xd3/0x2b0 net/socket.c:579
    sock_close+0x1b/0x30 net/socket.c:1162
    __fput+0x2df/0x8d0 fs/file_table.c:278
    ____fput+0x16/0x20 fs/file_table.c:309
    task_work_run+0x14a/0x1c0 kernel/task_work.c:113
    get_signal+0x1961/0x1d50 kernel/signal.c:2388
    do_signal+0x87/0x1940 arch/x86/kernel/signal.c:816
    exit_to_usermode_loop+0x244/0x2c0 arch/x86/entry/common.c:162
    prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
    syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
    do_syscall_64+0x52d/0x610 arch/x86/entry/common.c:293
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x457f29
    Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fb88cf42c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
    RAX: fffffffffffffe00 RBX: 0000000000000003 RCX: 0000000000457f29
    RDX: 0000000000000012 RSI: 0000000020000080 RDI: 0000000000000004
    RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb88cf436d4
    R13: 00000000004be462 R14: 00000000004cec98 R15: 00000000ffffffff
    Modules linked in:

    Fixes: 95d6ebd53c79 ("net/x25: fix use-after-free in x25_device_event()")
    Signed-off-by: Eric Dumazet
    Cc: andrew hendry
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Since commit eeea10b83a13 ("tcp: add
    tcp_v4_fill_cb()/tcp_v4_restore_cb()"), tcp_vX_fill_cb is only called
    after tcp_filter(). That means, TCP_SKB_CB(skb)->end_seq still points to
    the IP-part of the cb.

    We thus should not mock with it, as this can trigger bugs (thanks
    syzkaller):
    [ 12.349396] ==================================================================
    [ 12.350188] BUG: KASAN: slab-out-of-bounds in ip6_datagram_recv_specific_ctl+0x19b3/0x1a20
    [ 12.351035] Read of size 1 at addr ffff88006adbc208 by task test_ip6_datagr/1799

    Setting end_seq is actually no more necessary in tcp_filter as it gets
    initialized later on in tcp_vX_fill_cb.

    Cc: Eric Dumazet
    Fixes: eeea10b83a13 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
    Signed-off-by: Christoph Paasch
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Christoph Paasch
     
  • When running 'nft flush ruleset' while no rules exist, we will increment
    the generation counter and announce a new genid to userspace, yet
    nothing had changed in the first place.

    Signed-off-by: Florian Westphal
    Signed-off-by: Pablo Neira Ayuso

    Florian Westphal
     
  • In func check_6rd,tunnel->ip6rd.relay_prefixlen may equal to
    32,so UBSAN complain about it.

    UBSAN: Undefined behaviour in net/ipv6/sit.c:781:47
    shift exponent 32 is too large for 32-bit type 'unsigned int'
    CPU: 6 PID: 20036 Comm: syz-executor.0 Not tainted 4.19.27 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
    04/01/2014
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0xca/0x13e lib/dump_stack.c:113
    ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
    __ubsan_handle_shift_out_of_bounds+0x293/0x2e8 lib/ubsan.c:425
    check_6rd.constprop.9+0x433/0x4e0 net/ipv6/sit.c:781
    try_6rd net/ipv6/sit.c:806 [inline]
    ipip6_tunnel_xmit net/ipv6/sit.c:866 [inline]
    sit_tunnel_xmit+0x141c/0x2720 net/ipv6/sit.c:1033
    __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
    netdev_start_xmit include/linux/netdevice.h:4309 [inline]
    xmit_one net/core/dev.c:3243 [inline]
    dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
    __dev_queue_xmit+0x1656/0x2500 net/core/dev.c:3829
    neigh_output include/net/neighbour.h:501 [inline]
    ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
    ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
    NF_HOOK_COND include/linux/netfilter.h:278 [inline]
    ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
    dst_output include/net/dst.h:444 [inline]
    ip6_local_out+0x99/0x170 net/ipv6/output_core.c:176
    ip6_send_skb+0x9d/0x2f0 net/ipv6/ip6_output.c:1697
    ip6_push_pending_frames+0xc0/0x100 net/ipv6/ip6_output.c:1717
    rawv6_push_pending_frames net/ipv6/raw.c:616 [inline]
    rawv6_sendmsg+0x2435/0x3530 net/ipv6/raw.c:946
    inet_sendmsg+0xf8/0x5c0 net/ipv4/af_inet.c:798
    sock_sendmsg_nosec net/socket.c:621 [inline]
    sock_sendmsg+0xc8/0x110 net/socket.c:631
    ___sys_sendmsg+0x6cf/0x890 net/socket.c:2114
    __sys_sendmsg+0xf0/0x1b0 net/socket.c:2152
    do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Signed-off-by: linmiaohe
    Signed-off-by: David S. Miller

    Miaohe Lin
     

11 Mar, 2019

7 commits

  • Pull networking fixes from David Miller:
    "First batch of fixes in the new merge window:

    1) Double dst_cache free in act_tunnel_key, from Wenxu.

    2) Avoid NULL deref in IN_DEV_MFORWARD() by failing early in the
    ip_route_input_rcu() path, from Paolo Abeni.

    3) Fix appletalk compile regression, from Arnd Bergmann.

    4) If SLAB objects reach the TCP sendpage method we are in serious
    trouble, so put a debugging check there. From Vasily Averin.

    5) Memory leak in hsr layer, from Mao Wenan.

    6) Only test GSO type on GSO packets, from Willem de Bruijn.

    7) Fix crash in xsk_diag_put_umem(), from Eric Dumazet.

    8) Fix VNIC mailbox length in nfp, from Dirk van der Merwe.

    9) Fix race in ipv4 route exception handling, from Xin Long.

    10) Missing DMA memory barrier in hns3 driver, from Jian Shen.

    11) Use after free in __tcf_chain_put(), from Vlad Buslov.

    12) Handle inet_csk_reqsk_queue_add() failures, from Guillaume Nault.

    13) Return value correction when ip_mc_may_pull() fails, from Eric
    Dumazet.

    14) Use after free in x25_device_event(), also from Eric"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
    gro_cells: make sure device is up in gro_cells_receive()
    vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
    net/x25: fix use-after-free in x25_device_event()
    isdn: mISDNinfineon: fix potential NULL pointer dereference
    net: hns3: fix to stop multiple HNS reset due to the AER changes
    ip: fix ip_mc_may_pull() return value
    net: keep refcount warning in reqsk_free()
    net: stmmac: Avoid one more sometimes uninitialized Clang warning
    net: dsa: mv88e6xxx: Set correct interface mode for CPU/DSA ports
    rxrpc: Fix client call queueing, waiting for channel
    tcp: handle inet_csk_reqsk_queue_add() failures
    net: ethernet: sun: Zero initialize class in default case in niu_add_ethtool_tcam_entry
    8139too : Add support for U.S. Robotics USR997901A 10/100 Cardbus NIC
    fou, fou6: avoid uninit-value in gue_err() and gue6_err()
    net: sched: fix potential use-after-free in __tcf_chain_put()
    vhost: silence an unused-variable warning
    vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
    connector: fix unsafe usage of ->real_parent
    vxlan: do not need BH again in vxlan_cleanup()
    net: hns3: add dma_rmb() for rx description
    ...

    Linus Torvalds
     
  • Smatch reports:

    net/netfilter/nf_tables_api.c:2167 nf_tables_expr_destroy()
    error: dereferencing freed memory 'expr->ops'

    net/netfilter/nf_tables_api.c
    2162 static void nf_tables_expr_destroy(const struct nft_ctx *ctx,
    2163 struct nft_expr *expr)
    2164 {
    2165 if (expr->ops->destroy)
    2166 expr->ops->destroy(ctx, expr);
    ^^^^
    --> 2167 module_put(expr->ops->type->owner);
    ^^^^^^^^^
    2168 }

    Smatch says there are three functions which free expr->ops.

    Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension")
    Reported-by: Dan Carpenter
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Set deletion after flush coming in the same batch results in EBUSY. Add
    set use counter to track the number of references to this set from
    rules. We cannot rely on the list of bindings for this since such list
    is still populated from the preparation phase.

    Reported-by: Václav Zindulka
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Before trying to bind a port, ensure we grab the send lock to
    ensure that we don't change the port while another task is busy
    transmitting requests.
    The connect code already takes the send lock in xprt_connect(),
    but it is harmless to take it before that.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • In cases where we know the task is not sleeping, try to optimise
    away the indirect call to task->tk_action() by replacing it with
    a direct call.
    Only change tail calls, to allow gcc to perform tail call
    elimination.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • We keep receiving syzbot reports [1] that show that tunnels do not play
    the rcu/IFF_UP rules properly.

    At device dismantle phase, gro_cells_destroy() will be called
    only after a full rcu grace period is observed after IFF_UP
    has been cleared.

    This means that IFF_UP needs to be tested before queueing packets
    into netif_rx() or gro_cells.

    This patch implements the test in gro_cells_receive() because
    too many callers do not seem to bother enough.

    [1]
    BUG: unable to handle kernel paging request at fffff4ca0b9ffffe
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.0.0+ #97
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: netns cleanup_net
    RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline]
    RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline]
    RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline]
    RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline]
    RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78
    Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03
    RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02
    RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe
    RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0
    RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645
    R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000
    R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75
    kobject: 'loop2' (000000004bd7d84a): kobject_uevent_env
    FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0
    Call Trace:
    kobject: 'loop2' (000000004bd7d84a): fill_kobj_path: path = '/devices/virtual/block/loop2'
    ip_tunnel_dev_free+0x19/0x60 net/ipv4/ip_tunnel.c:1010
    netdev_run_todo+0x51c/0x7d0 net/core/dev.c:8970
    rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:116
    ip_tunnel_delete_nets+0x423/0x5f0 net/ipv4/ip_tunnel.c:1124
    vti_exit_batch_net+0x23/0x30 net/ipv4/ip_vti.c:495
    ops_exit_list.isra.0+0x105/0x160 net/core/net_namespace.c:156
    cleanup_net+0x3fb/0x960 net/core/net_namespace.c:551
    process_one_work+0x98e/0x1790 kernel/workqueue.c:2173
    worker_thread+0x98/0xe40 kernel/workqueue.c:2319
    kthread+0x357/0x430 kernel/kthread.c:246
    ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
    Modules linked in:
    CR2: fffff4ca0b9ffffe
    [ end trace 513fc9c1338d1cb3 ]
    RIP: 0010:__skb_unlink include/linux/skbuff.h:1929 [inline]
    RIP: 0010:__skb_dequeue include/linux/skbuff.h:1945 [inline]
    RIP: 0010:__skb_queue_purge include/linux/skbuff.h:2656 [inline]
    RIP: 0010:gro_cells_destroy net/core/gro_cells.c:89 [inline]
    RIP: 0010:gro_cells_destroy+0x19d/0x360 net/core/gro_cells.c:78
    Code: 03 42 80 3c 20 00 0f 85 53 01 00 00 48 8d 7a 08 49 8b 47 08 49 c7 07 00 00 00 00 48 89 f9 49 c7 47 08 00 00 00 00 48 c1 e9 03 80 3c 21 00 0f 85 10 01 00 00 48 89 c1 48 89 42 08 48 c1 e9 03
    RSP: 0018:ffff8880aa3f79a8 EFLAGS: 00010a02
    RAX: 00ffffffffffffe8 RBX: ffffe8ffffc64b70 RCX: 1ffff8ca0b9ffffe
    RDX: ffffc6505cffffe8 RSI: ffffffff858410ca RDI: ffffc6505cfffff0
    RBP: ffff8880aa3f7a08 R08: ffff8880aa3e8580 R09: fffffbfff1263645
    R10: fffffbfff1263644 R11: ffffffff8931b223 R12: dffffc0000000000
    kobject: 'loop3' (00000000e4ee57a6): kobject_uevent_env
    R13: 0000000000000000 R14: ffffe8ffffc64b80 R15: ffffe8ffffc64b75
    FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: fffff4ca0b9ffffe CR3: 0000000094941000 CR4: 00000000001406f0

    Fixes: c9e6bc644e55 ("net: add gro_cells infrastructure")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • In case of failure x25_connect() does a x25_neigh_put(x25->neighbour)
    but forgets to clear x25->neighbour pointer, thus triggering use-after-free.

    Since the socket is visible in x25_list, we need to hold x25_list_lock
    to protect the operation.

    syzbot report :

    BUG: KASAN: use-after-free in x25_kill_by_device net/x25/af_x25.c:217 [inline]
    BUG: KASAN: use-after-free in x25_device_event+0x296/0x2b0 net/x25/af_x25.c:252
    Read of size 8 at addr ffff8880a030edd0 by task syz-executor003/7854

    CPU: 0 PID: 7854 Comm: syz-executor003 Not tainted 5.0.0+ #97
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x172/0x1f0 lib/dump_stack.c:113
    print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
    kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
    __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
    x25_kill_by_device net/x25/af_x25.c:217 [inline]
    x25_device_event+0x296/0x2b0 net/x25/af_x25.c:252
    notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739
    call_netdevice_notifiers_extack net/core/dev.c:1751 [inline]
    call_netdevice_notifiers net/core/dev.c:1765 [inline]
    __dev_notify_flags+0x1e9/0x2c0 net/core/dev.c:7607
    dev_change_flags+0x10d/0x170 net/core/dev.c:7643
    dev_ifsioc+0x2b0/0x940 net/core/dev_ioctl.c:237
    dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:488
    sock_do_ioctl+0x1bd/0x300 net/socket.c:995
    sock_ioctl+0x32b/0x610 net/socket.c:1096
    vfs_ioctl fs/ioctl.c:46 [inline]
    file_ioctl fs/ioctl.c:509 [inline]
    do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
    ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
    __do_sys_ioctl fs/ioctl.c:720 [inline]
    __se_sys_ioctl fs/ioctl.c:718 [inline]
    __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x4467c9
    Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
    RSP: 002b:00007fdbea222d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 00000000006dbc58 RCX: 00000000004467c9
    RDX: 0000000020000340 RSI: 0000000000008914 RDI: 0000000000000003
    RBP: 00000000006dbc50 R08: 00007fdbea223700 R09: 0000000000000000
    R10: 00007fdbea223700 R11: 0000000000000246 R12: 00000000006dbc5c
    R13: 6000030030626669 R14: 0000000000000000 R15: 0000000030626669

    Allocated by task 7843:
    save_stack+0x45/0xd0 mm/kasan/common.c:73
    set_track mm/kasan/common.c:85 [inline]
    __kasan_kmalloc mm/kasan/common.c:495 [inline]
    __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:468
    kasan_kmalloc+0x9/0x10 mm/kasan/common.c:509
    kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3615
    kmalloc include/linux/slab.h:545 [inline]
    x25_link_device_up+0x46/0x3f0 net/x25/x25_link.c:249
    x25_device_event+0x116/0x2b0 net/x25/af_x25.c:242
    notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
    __raw_notifier_call_chain kernel/notifier.c:394 [inline]
    raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
    call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739
    call_netdevice_notifiers_extack net/core/dev.c:1751 [inline]
    call_netdevice_notifiers net/core/dev.c:1765 [inline]
    __dev_notify_flags+0x121/0x2c0 net/core/dev.c:7605
    dev_change_flags+0x10d/0x170 net/core/dev.c:7643
    dev_ifsioc+0x2b0/0x940 net/core/dev_ioctl.c:237
    dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:488
    sock_do_ioctl+0x1bd/0x300 net/socket.c:995
    sock_ioctl+0x32b/0x610 net/socket.c:1096
    vfs_ioctl fs/ioctl.c:46 [inline]
    file_ioctl fs/ioctl.c:509 [inline]
    do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
    ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
    __do_sys_ioctl fs/ioctl.c:720 [inline]
    __se_sys_ioctl fs/ioctl.c:718 [inline]
    __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Freed by task 7865:
    save_stack+0x45/0xd0 mm/kasan/common.c:73
    set_track mm/kasan/common.c:85 [inline]
    __kasan_slab_free+0x102/0x150 mm/kasan/common.c:457
    kasan_slab_free+0xe/0x10 mm/kasan/common.c:465
    __cache_free mm/slab.c:3494 [inline]
    kfree+0xcf/0x230 mm/slab.c:3811
    x25_neigh_put include/net/x25.h:253 [inline]
    x25_connect+0x8d8/0xde0 net/x25/af_x25.c:824
    __sys_connect+0x266/0x330 net/socket.c:1685
    __do_sys_connect net/socket.c:1696 [inline]
    __se_sys_connect net/socket.c:1693 [inline]
    __x64_sys_connect+0x73/0xb0 net/socket.c:1693
    do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    The buggy address belongs to the object at ffff8880a030edc0
    which belongs to the cache kmalloc-256 of size 256
    The buggy address is located 16 bytes inside of
    256-byte region [ffff8880a030edc0, ffff8880a030eec0)
    The buggy address belongs to the page:
    page:ffffea000280c380 count:1 mapcount:0 mapping:ffff88812c3f07c0 index:0x0
    flags: 0x1fffc0000000200(slab)
    raw: 01fffc0000000200 ffffea0002806788 ffffea00027f0188 ffff88812c3f07c0
    raw: 0000000000000000 ffff8880a030e000 000000010000000c 0000000000000000
    page dumped because: kasan: bad access detected

    Signed-off-by: Eric Dumazet
    Reported-by: syzbot+04babcefcd396fabec37@syzkaller.appspotmail.com
    Cc: andrew hendry
    Signed-off-by: David S. Miller

    Eric Dumazet
     

10 Mar, 2019

4 commits

  • Before initiating transport actions that require putting the task to sleep,
    such as rebinding or reconnecting, we should check whether or not the task
    was already transmitted.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Pull rdma updates from Jason Gunthorpe:
    "This has been a slightly more active cycle than normal with ongoing
    core changes and quite a lot of collected driver updates.

    - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe

    - A new data transfer mode for HFI1 giving higher performance

    - Significant functional and bug fix update to the mlx5
    On-Demand-Paging MR feature

    - A chip hang reset recovery system for hns

    - Change mm->pinned_vm to an atomic64

    - Update bnxt_re to support a new 57500 chip

    - A sane netlink 'rdma link add' method for creating rxe devices and
    fixing the various unregistration race conditions in rxe's
    unregister flow

    - Allow lookup up objects by an ID over netlink

    - Various reworking of the core to driver interface:
    - drivers should not assume umem SGLs are in PAGE_SIZE chunks
    - ucontext is accessed via udata not other means
    - start to make the core code responsible for object memory
    allocation
    - drivers should convert struct device to struct ib_device via a
    helper
    - drivers have more tools to avoid use after unregister problems"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits)
    net/mlx5: ODP support for XRC transport is not enabled by default in FW
    IB/hfi1: Close race condition on user context disable and close
    RDMA/umem: Revert broken 'off by one' fix
    RDMA/umem: minor bug fix in error handling path
    RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
    cxgb4: kfree mhp after the debug print
    IB/rdmavt: Fix concurrency panics in QP post_send and modify to error
    IB/rdmavt: Fix loopback send with invalidate ordering
    IB/iser: Fix dma_nents type definition
    IB/mlx5: Set correct write permissions for implicit ODP MR
    bnxt_re: Clean cq for kernel consumers only
    RDMA/uverbs: Don't do double free of allocated PD
    RDMA: Handle ucontext allocations by IB/core
    RDMA/core: Fix a WARN() message
    bnxt_re: fix the regression due to changes in alloc_pbl
    IB/mlx4: Increase the timeout for CM cache
    IB/core: Abort page fault handler silently during owning process exit
    IB/mlx5: Validate correct PD before prefetch MR
    IB/mlx5: Protect against prefetch of invalid MR
    RDMA/uverbs: Store PR pointer before it is overwritten
    ...

    Linus Torvalds
     
  • The RPC task wakeup calls all check for RPC_IS_QUEUED() before taking any
    locks. In addition, rpc_exit() already calls rpc_wake_up_queued_task().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Replace remaining callers of call_timeout() with rpc_check_timeout().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

09 Mar, 2019

5 commits

  • rxrpc_get_client_conn() adds a new call to the front of the waiting_calls
    queue if the connection it's going to use already exists. This is bad as
    it allows calls to get starved out.

    Fix this by adding to the tail instead.

    Also change the other enqueue point in the same function to put it on the
    front (ie. when we have a new connection). This makes the point that in
    the case of a new connection the new call goes at the front (though it
    doesn't actually matter since the queue should be unoccupied).

    Fixes: 45025bceef17 ("rxrpc: Improve management and caching of client connection objects")
    Signed-off-by: David Howells
    Reviewed-by: Marc Dionne
    Signed-off-by: David S. Miller

    David Howells
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2019-03-09

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) Fix a crash in AF_XDP's xsk_diag_put_ring() which was passing
    wrong queue argument, from Eric.

    2) Fix a regression due to wrong test for TCP GSO packets used in
    various BPF helpers like NAT64, from Willem.

    3) Fix a sk_msg strparser warning which asserts that strparser must
    be stopped first, from Jakub.

    4) Fix rejection of invalid options/bind flags in AF_XDP, from Björn.

    5) Fix GSO in bpf_lwt_push_ip_encap() which must properly set inner
    headers and inner protocol, from Peter.

    6) Fix a libbpf leak when kernel does not support BTF, from Nikita.

    7) Various BPF selftest and libbpf build fixes to make out-of-tree
    compilation work and to properly resolve dependencies via fixdep
    target, from Stanislav.

    8) Fix rejection of invalid ldimm64 imm field, from Daniel.

    9) Fix bpf stats sysctl compile warning of unused helper function
    proc_dointvec_minmax_bpf_stats() under some configs, from Arnd.

    10) Fix couple of warnings about using plain integer as NULL, from Bo.

    11) Fix some BPF sample spelling mistakes, from Colin.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit 7716682cc58e ("tcp/dccp: fix another race at listener
    dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted
    {tcp,dccp}_check_req() accordingly. However, TFO and syncookies
    weren't modified, thus leaking allocated resources on error.

    Contrary to tcp_check_req(), in both syncookies and TFO cases,
    we need to drop the request socket. Also, since the child socket is
    created with inet_csk_clone_lock(), we have to unlock it and drop an
    extra reference (->sk_refcount is initially set to 2 and
    inet_csk_reqsk_queue_add() drops only one ref).

    For TFO, we also need to revert the work done by tcp_try_fastopen()
    (with reqsk_fastopen_remove()).

    Fixes: 7716682cc58e ("tcp/dccp: fix another race at listener dismantle")
    Signed-off-by: Guillaume Nault
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Guillaume Nault
     
  • My prior commit missed the fact that these functions
    were using udp_hdr() (aka skb_transport_header())
    to get access to GUE header.

    Since pskb_transport_may_pull() does not exist yet, we have to add
    transport_offset to our pskb_may_pull() calls.

    BUG: KMSAN: uninit-value in gue_err+0x514/0xfa0 net/ipv4/fou.c:1032
    CPU: 1 PID: 10648 Comm: syz-executor.1 Not tainted 5.0.0+ #11
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:

    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x173/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:600
    __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
    gue_err+0x514/0xfa0 net/ipv4/fou.c:1032
    __udp4_lib_err_encap_no_sk net/ipv4/udp.c:571 [inline]
    __udp4_lib_err_encap net/ipv4/udp.c:626 [inline]
    __udp4_lib_err+0x12e6/0x1d40 net/ipv4/udp.c:665
    udp_err+0x74/0x90 net/ipv4/udp.c:737
    icmp_socket_deliver net/ipv4/icmp.c:767 [inline]
    icmp_unreach+0xb65/0x1070 net/ipv4/icmp.c:884
    icmp_rcv+0x11a1/0x1950 net/ipv4/icmp.c:1066
    ip_protocol_deliver_rcu+0x584/0xbb0 net/ipv4/ip_input.c:208
    ip_local_deliver_finish net/ipv4/ip_input.c:234 [inline]
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ip_local_deliver+0x624/0x7b0 net/ipv4/ip_input.c:255
    dst_input include/net/dst.h:450 [inline]
    ip_rcv_finish net/ipv4/ip_input.c:414 [inline]
    NF_HOOK include/linux/netfilter.h:289 [inline]
    ip_rcv+0x6bd/0x740 net/ipv4/ip_input.c:524
    __netif_receive_skb_one_core net/core/dev.c:4973 [inline]
    __netif_receive_skb net/core/dev.c:5083 [inline]
    process_backlog+0x756/0x10e0 net/core/dev.c:5923
    napi_poll net/core/dev.c:6346 [inline]
    net_rx_action+0x78b/0x1a60 net/core/dev.c:6412
    __do_softirq+0x53f/0x93a kernel/softirq.c:293
    invoke_softirq kernel/softirq.c:375 [inline]
    irq_exit+0x214/0x250 kernel/softirq.c:416
    exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:536
    smp_apic_timer_interrupt+0x48/0x70 arch/x86/kernel/apic/apic.c:1064
    apic_timer_interrupt+0x2e/0x40 arch/x86/entry/entry_64.S:814

    RIP: 0010:finish_lock_switch+0x2b/0x40 kernel/sched/core.c:2597
    Code: 48 89 e5 53 48 89 fb e8 63 e7 95 00 8b b8 88 0c 00 00 48 8b 00 48 85 c0 75 12 48 89 df e8 dd db 95 00 c6 00 00 c6 03 00 fb 5b c3 e8 4e e6 95 00 eb e7 66 90 66 2e 0f 1f 84 00 00 00 00 00 55
    RSP: 0018:ffff888081a0fc80 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff13
    RAX: ffff88821fd6bd80 RBX: ffff888027898000 RCX: ccccccccccccd000
    RDX: ffff88821fca8d80 RSI: ffff888000000000 RDI: 00000000000004a0
    RBP: ffff888081a0fc80 R08: 0000000000000002 R09: ffff888081a0fb08
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
    R13: ffff88811130e388 R14: ffff88811130da00 R15: ffff88812fdb7d80
    finish_task_switch+0xfc/0x2d0 kernel/sched/core.c:2698
    context_switch kernel/sched/core.c:2851 [inline]
    __schedule+0x6cc/0x800 kernel/sched/core.c:3491
    schedule+0x15b/0x240 kernel/sched/core.c:3535
    freezable_schedule include/linux/freezer.h:172 [inline]
    do_nanosleep+0x2ba/0x980 kernel/time/hrtimer.c:1679
    hrtimer_nanosleep kernel/time/hrtimer.c:1733 [inline]
    __do_sys_nanosleep kernel/time/hrtimer.c:1767 [inline]
    __se_sys_nanosleep+0x746/0x960 kernel/time/hrtimer.c:1754
    __x64_sys_nanosleep+0x3e/0x60 kernel/time/hrtimer.c:1754
    do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
    entry_SYSCALL_64_after_hwframe+0x63/0xe7
    RIP: 0033:0x4855a0
    Code: 00 00 48 c7 c0 d4 ff ff ff 64 c7 00 16 00 00 00 31 c0 eb be 66 0f 1f 44 00 00 83 3d b1 11 5d 00 00 75 14 b8 23 00 00 00 0f 05 3d 01 f0 ff ff 0f 83 04 e2 f8 ff c3 48 83 ec 08 e8 3a 55 fd ff
    RSP: 002b:0000000000a4fd58 EFLAGS: 00000246 ORIG_RAX: 0000000000000023
    RAX: ffffffffffffffda RBX: 0000000000085780 RCX: 00000000004855a0
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000a4fd60
    RBP: 00000000000007ec R08: 0000000000000001 R09: 0000000000ceb940
    R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
    R13: 0000000000a4fdb0 R14: 0000000000085711 R15: 0000000000a4fdc0

    Uninit was created at:
    kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205 [inline]
    kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159
    kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
    kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
    slab_post_alloc_hook mm/slab.h:445 [inline]
    slab_alloc_node mm/slub.c:2773 [inline]
    __kmalloc_node_track_caller+0xe9e/0xff0 mm/slub.c:4398
    __kmalloc_reserve net/core/skbuff.c:140 [inline]
    __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
    alloc_skb include/linux/skbuff.h:1012 [inline]
    alloc_skb_with_frags+0x186/0xa60 net/core/skbuff.c:5287
    sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2091
    sock_alloc_send_skb+0xca/0xe0 net/core/sock.c:2108
    __ip_append_data+0x34cd/0x5000 net/ipv4/ip_output.c:998
    ip_append_data+0x324/0x480 net/ipv4/ip_output.c:1220
    icmp_push_reply+0x23d/0x7e0 net/ipv4/icmp.c:375
    __icmp_send+0x2ea3/0x30f0 net/ipv4/icmp.c:737
    icmp_send include/net/icmp.h:47 [inline]
    ipv4_link_failure+0x6d/0x230 net/ipv4/route.c:1190
    dst_link_failure include/net/dst.h:427 [inline]
    arp_error_report+0x106/0x1a0 net/ipv4/arp.c:297
    neigh_invalidate+0x359/0x8e0 net/core/neighbour.c:992
    neigh_timer_handler+0xdf2/0x1280 net/core/neighbour.c:1078
    call_timer_fn+0x285/0x600 kernel/time/timer.c:1325
    expire_timers kernel/time/timer.c:1362 [inline]
    __run_timers+0xdb4/0x11d0 kernel/time/timer.c:1681
    run_timer_softirq+0x2e/0x50 kernel/time/timer.c:1694
    __do_softirq+0x53f/0x93a kernel/softirq.c:293

    Fixes: 26fc181e6cac ("fou, fou6: do not assume linear skbs")
    Signed-off-by: Eric Dumazet
    Reported-by: syzbot
    Cc: Stefano Brivio
    Cc: Sabrina Dubroca
    Acked-by: Stefano Brivio
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When used with unlocked classifier that have filters attached to actions
    with goto chain, __tcf_chain_put() for last non action reference can race
    with calls to same function from action cleanup code that releases last
    action reference. In this case action cleanup handler could free the chain
    if it executes after all references to chain were released, but before all
    concurrent users finished using it. Modify __tcf_chain_put() to only access
    tcf_chain fields when holding block->lock. Remove local variables that were
    used to cache some tcf_chain fields and are no longer needed because their
    values can now be obtained directly from chain under block->lock
    protection.

    Fixes: 726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
    Signed-off-by: Vlad Buslov
    Signed-off-by: David S. Miller

    Vlad Buslov